this post was submitted on 01 Jul 2024
-6 points (37.5% liked)

Selfhosted

40133 readers
545 users here now

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.

Rules:

  1. Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.

  2. No spam posting.

  3. Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it's not obvious why your post topic revolves around selfhosting, please include details to make it clear.

  4. Don't duplicate the full text of your blog or github here. Just post the link for folks to click.

  5. Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).

  6. No trolling.

Resources:

Any issues on the community? Report it using the report flag.

Questions? DM the mods!

founded 1 year ago
MODERATORS
 

Check out our open-source, language-agnostic mutation testing tool using LLM agents here: https://github.com/codeintegrity-ai/mutahunter

Mutation testing is a way to verify the effectiveness of your test cases. It involves creating small changes, or “mutants,” in the code and checking if the test cases can catch these changes. Unlike line coverage, which only tells you how much of the code has been executed, mutation testing tells you how well it’s been tested. We all know line coverage is BS.

That’s where Mutahunter comes in. We leverage LLM models to inject context-aware faults into your codebase. As the first AI-based mutation testing tool, Our AI-driven approach provides a full contextual understanding of the entire codebase by using the AST, enabling it to identify and inject mutations that closely resemble real vulnerabilities. This ensures comprehensive and effective testing, significantly enhancing software security and quality. We also make use of LiteLLM, so we support all major self-hosted LLM models

We’ve added examples for JavaScript, Python, and Go (see /examples). It can theoretically work with any programming language that provides a coverage report in Cobertura XML format (more supported soon) and has a language grammar available in TreeSitter.

Here’s a YouTube video with an in-depth explanation: https://www.youtube.com/watch?v=8h4zpeK6LOA

Here’s our blog with more details: https://medium.com/codeintegrity-engineering/transforming-qa-mutahunter-and-the-power-of-llm-enhanced-mutation-testing-18c1ea19add8

Check it out and let us know what you think! We’re excited to get feedback from the community and help developers everywhere improve their code quality.

you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 4 points 4 months ago (1 children)

This is the kind of AI stuff that really annoys me. Looking at one of the mutation examples I didn't see anything that wouldn't normally be tested by a typical mutation tool. You took a simple, idempotent process and you got an llm to do it slower, less accurately, and using more resources.

If you wanted to marry the two in a new and possibly useful fashion I would say use an llm to analyze the results of a standard mutation test and give guidance on what issues should be acted upon first. An off-by-one calculation could mean somebody loses a million dollars or it could mean a button is grayed out. Standard mutation tools don't give you that context.

[–] [email protected] -3 points 4 months ago

Hey, engineer here who worked on this.

I understand your concerns. The examples we provided are indeed trivial, but they are just the starting point. Our goal is to leverage LLMs to generate mutants that closely resemble real-world bugs with better context. While traditional mutation tools are excellent, we believe LLMs can bring an additional layer of sophistication and versatility.

As you rightly pointed out, standard mutation tools often lack the context to prioritize issues effectively. We’re currently working on using LLMs to analyze the output of survived mutants to provide better guidance on which issues should be addressed first. This way, an off-by-one error that could potentially cause significant problems is highlighted more prominently during the code review PR process.

As someone who has used mutation testing, I’ve always wondered about the sheer amount of useless mutants being generated. Going through all these mutants manually to improve test cases is quite cumbersome. If we can reduce the number of mutants generated, produce higher quality mutants, and analyze them automatically to highlight weaknesses in the tests during PRs, wouldn’t that be cool? We’re aiming to achieve just that.

Moreover, this approach can theoretically work for any programming language or testing framework, making it a versatile solution across different development environments.

We’re also developing a QA system to more accurately define and identify “higher quality mutants,” as discussed in the research paper here. Our aim is to enhance the overall mutation testing process, making it more efficient and insightful.

Hey, all in all, we want mutation testing to be adopted and widely spread. We really do appreciate the feedback. I hope you try it out as you sound like you know a thing or two about mutation testing.

Thanks again for your perspective.