<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Mutation-Testing on Daffa Abhipraya</title><link>https://blog.abhipraya.dev/tags/mutation-testing/</link><description>Recent content in Mutation-Testing on Daffa Abhipraya</description><generator>Hugo</generator><language>en-us</language><copyright>© Daffa Abhipraya</copyright><lastBuildDate>Wed, 15 Apr 2026 00:00:00 +0700</lastBuildDate><atom:link href="https://blog.abhipraya.dev/tags/mutation-testing/index.xml" rel="self" type="application/rss+xml"/><item><title>PPL: AI at Different Cognitive Distances [Sprint 2, Week 3]</title><link>https://blog.abhipraya.dev/ppl/part-b/s2w3-ai-literacy/</link><pubDate>Wed, 15 Apr 2026 00:00:00 +0700</pubDate><guid>https://blog.abhipraya.dev/ppl/part-b/s2w3-ai-literacy/</guid><description>&lt;h2 id="what-i-worked-on">
 &lt;a class="anchor" href="#what-i-worked-on" data-anchor="what-i-worked-on" aria-hidden="true">#&lt;/a>
 What I Worked On
&lt;/h2>
&lt;p>Two weeks where AI was the primary productivity multiplier across very different task shapes. Week 1 leaned on iterative CI debugging (ten MRs of &amp;ldquo;run pipeline, read failure, ask AI, apply fix&amp;rdquo;) and integration-test infrastructure design. Week 2 leaned on bulk test generation (400+ mutation-killing assertions), bash-with-tricky-primitives design (a &lt;code>flock&lt;/code>-based slot allocator), and cross-agent documentation research. The common thread across all six patterns: AI is best when the task is a known pattern applied to new context, and weakest when the task is about how primitives interact in a specific environment.&lt;/p></description></item><item><title>PPL: Mutation Testing, From Setup to Score [Sprint 2, Week 3]</title><link>https://blog.abhipraya.dev/ppl/part-b/s2w3-tdd/</link><pubDate>Wed, 15 Apr 2026 00:00:00 +0700</pubDate><guid>https://blog.abhipraya.dev/ppl/part-b/s2w3-tdd/</guid><description>&lt;h2 id="what-i-worked-on">
 &lt;a class="anchor" href="#what-i-worked-on" data-anchor="what-i-worked-on" aria-hidden="true">#&lt;/a>
 What I Worked On
&lt;/h2>
&lt;p>Two weeks on mutation testing across the SIRA codebase. The first week wired &lt;a href="https://github.com/boxed/mutmut">mutmut&lt;/a> (Python) and &lt;a href="https://stryker-mutator.io/">Stryker&lt;/a> (TypeScript) into the pipeline and uncovered the uncomfortable truth: 91% line coverage on the services layer translated to a mutation score just above zero in places. The second week closed that gap by writing 400+ targeted tests across two rounds, driving the API mutation score from ~66% to 80.3%. This blog tells the full arc.&lt;/p></description></item><item><title>PPL: Quality as a Feedback Loop [Sprint 2, Week 3]</title><link>https://blog.abhipraya.dev/ppl/part-b/s2w3-code-quality/</link><pubDate>Wed, 15 Apr 2026 00:00:00 +0700</pubDate><guid>https://blog.abhipraya.dev/ppl/part-b/s2w3-code-quality/</guid><description>&lt;h2 id="what-i-worked-on">
 &lt;a class="anchor" href="#what-i-worked-on" data-anchor="what-i-worked-on" aria-hidden="true">#&lt;/a>
 What I Worked On
&lt;/h2>
&lt;p>Two weeks of quality infrastructure: wiring in the tools that measure test quality (week 1), then shortening the feedback loop so that signal actually influences the code being written (week 2). Starting state: 91% line coverage, no mutation testing, integration test coverage missing from SonarQube. Ending state: combined unit + integration coverage in SonarQube, mutmut + Stryker running per-MR with results in the CI comment, API mutation score 80.3%.&lt;/p></description></item><item><title>PPL: New Learnings Applied to SIRA [Sprint 2, Week 2]</title><link>https://blog.abhipraya.dev/ppl/part-c/s2w2-aptitude/</link><pubDate>Thu, 09 Apr 2026 00:00:00 +0700</pubDate><guid>https://blog.abhipraya.dev/ppl/part-c/s2w2-aptitude/</guid><description>&lt;h2 id="overview">
 &lt;a class="anchor" href="#overview" data-anchor="overview" aria-hidden="true">#&lt;/a>
 Overview
&lt;/h2>
&lt;p>Sprint 2 Week 2 (April 3 to 9) was dominated by a mutation testing sprint: setting up mutmut and Stryker in CI, building integration test infrastructure from scratch, and writing mutation-killing tests. The week also delivered three application features and a meta-level improvement to AI workflow. Each area required learning at least one technology or concept not covered in standard Fasilkom coursework.&lt;/p>
&lt;hr>
&lt;h2 id="1-mutation-testing-with-mutmut-python">
 &lt;a class="anchor" href="#1-mutation-testing-with-mutmut-python" data-anchor="1-mutation-testing-with-mutmut-python" aria-hidden="true">#&lt;/a>
 1. Mutation Testing with mutmut (Python)
&lt;/h2>
&lt;p>Mutation testing is a technique where a tool injects small faults (&amp;ldquo;mutants&amp;rdquo;) into your source code (flipping operators, changing constants, removing lines) and checks whether your test suite catches each fault. If a test fails, the mutant is &amp;ldquo;killed.&amp;rdquo; If no test fails, the mutant &amp;ldquo;survives,&amp;rdquo; meaning your tests have a blind spot.&lt;/p></description></item><item><title>PPL: When 91% Test Coverage Means Nothing</title><link>https://blog.abhipraya.dev/ppl/part-a/tdd/</link><pubDate>Thu, 09 Apr 2026 00:00:00 +0700</pubDate><guid>https://blog.abhipraya.dev/ppl/part-a/tdd/</guid><description>&lt;p>We had 91% line coverage and felt good about it. Then we ran mutation testing and scored 0%. Every line of our service layer was executed by tests, but almost nothing was actually verified. This is the story of how we discovered the gap between &amp;ldquo;code was run&amp;rdquo; and &amp;ldquo;code was checked,&amp;rdquo; and what we changed to close it.&lt;/p>
&lt;blockquote>
&lt;p>&lt;strong>Note:&lt;/strong> Our project is hosted on an internal GitLab instance, so we use the term &lt;strong>MR (Merge Request)&lt;/strong> throughout this blog. If you&amp;rsquo;re coming from GitHub, MRs are the equivalent of &lt;strong>Pull Requests (PRs)&lt;/strong>.&lt;/p></description></item><item><title>PPL: Beyond Unit Tests [Sprint 2, Week 1]</title><link>https://blog.abhipraya.dev/ppl/part-b/s2w1-tdd/</link><pubDate>Mon, 23 Mar 2026 00:00:00 +0700</pubDate><guid>https://blog.abhipraya.dev/ppl/part-b/s2w1-tdd/</guid><description>&lt;h2 id="what-i-worked-on">
 &lt;a class="anchor" href="#what-i-worked-on" data-anchor="what-i-worked-on" aria-hidden="true">#&lt;/a>
 What I Worked On
&lt;/h2>
&lt;p>This week I pushed our testing strategy well beyond standard unit tests. The project already had 433 backend and 200 frontend tests with 91% line coverage, but I wanted to answer a harder question: &lt;strong>do our tests actually catch bugs, or do they just execute code?&lt;/strong>&lt;/p>
&lt;p>I added four advanced testing approaches: property-based testing (Hypothesis + fast-check), behavioral testing (pytest-bdd with Gherkin), mutation testing (mutmut + Stryker), and test isolation verification (pytest-randomly). The results were eye-opening.&lt;/p></description></item><item><title>PPL: From 31 Violations to Zero [Sprint 2, Week 1]</title><link>https://blog.abhipraya.dev/ppl/part-b/s2w1-code-quality/</link><pubDate>Mon, 23 Mar 2026 00:00:00 +0700</pubDate><guid>https://blog.abhipraya.dev/ppl/part-b/s2w1-code-quality/</guid><description>&lt;h2 id="what-i-worked-on">
 &lt;a class="anchor" href="#what-i-worked-on" data-anchor="what-i-worked-on" aria-hidden="true">#&lt;/a>
 What I Worked On
&lt;/h2>
&lt;p>This week I enforced strict quality gates across the entire CI pipeline. The project previously had &lt;code>allow_failure: true&lt;/code> on SonarQube and security scans, meaning violations were reported but never blocked merges. I changed that.&lt;/p>
&lt;h2 id="sonarqube-31-violations--0">
 &lt;a class="anchor" href="#sonarqube-31-violations--0" data-anchor="sonarqube-31-violations--0" aria-hidden="true">#&lt;/a>
 SonarQube: 31 Violations → 0
&lt;/h2>
&lt;h3 id="the-violations">
 &lt;a class="anchor" href="#the-violations" data-anchor="the-violations" aria-hidden="true">#&lt;/a>
 The Violations
&lt;/h3>
&lt;p>SonarQube flagged 31 issues across the codebase:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>1 CRITICAL vulnerability&lt;/strong>: &lt;code>jwt.get_unverified_header()&lt;/code> reading JWT headers without signature verification&lt;/li>
&lt;li>&lt;strong>3 CRITICAL code smells&lt;/strong>: duplicated string literals, nested component definitions&lt;/li>
&lt;li>&lt;strong>27 other issues&lt;/strong>: unused variables, missing &lt;code>Readonly&amp;lt;&amp;gt;&lt;/code> on props, duplicate CSS blocks, array index keys&lt;/li>
&lt;/ul>
&lt;h3 id="the-fixes">
 &lt;a class="anchor" href="#the-fixes" data-anchor="the-fixes" aria-hidden="true">#&lt;/a>
 The Fixes
&lt;/h3>
&lt;p>&lt;strong>Backend&lt;/strong> (3 files): Refactored JWT decode to try HS256 first and fall back to asymmetric on &lt;code>DecodeError&lt;/code>, eliminating the unverified header call entirely. Extracted duplicated literals to constants.&lt;/p></description></item><item><title>PPL: When 91% Test Coverage Means Nothing</title><link>https://blog.abhipraya.dev/ppl/part-a/tdd-and-qa/</link><pubDate>Sun, 15 Mar 2026 00:00:00 +0700</pubDate><guid>https://blog.abhipraya.dev/ppl/part-a/tdd-and-qa/</guid><description>&lt;p>We had 91% line coverage and felt good about it. Then we ran mutation testing and scored 0%. Every line of our service layer was executed by tests; almost nothing was actually verified. This is the story of how six advanced testing tools exposed the gap between &amp;ldquo;code was run&amp;rdquo; and &amp;ldquo;code was checked,&amp;rdquo; and what that means for any team relying on coverage as a quality signal.&lt;/p>
&lt;blockquote>
&lt;p>&lt;strong>Note:&lt;/strong> Our project is hosted on an internal GitLab instance, so we use the term &lt;strong>MR (Merge Request)&lt;/strong> throughout this blog. If you&amp;rsquo;re coming from GitHub, MRs are the equivalent of &lt;strong>Pull Requests (PRs)&lt;/strong>.&lt;/p></description></item></channel></rss>