Mutation-Testing on Daffa Abhipraya

PPL: AI at Different Cognitive Distances [Sprint 2, Week 3]

Wed, 15 Apr 2026 00:00:00 +0700

What I Worked On

Two weeks where AI was the primary productivity multiplier across very different task shapes. Week 1 leaned on iterative CI debugging (ten MRs of “run pipeline, read failure, ask AI, apply fix”) and integration-test infrastructure design. Week 2 leaned on bulk test generation (400+ mutation-killing assertions), bash-with-tricky-primitives design (a flock-based slot allocator), and cross-agent documentation research. The common thread across all six patterns: AI is best when the task is a known pattern applied to new context, and weakest when the task is about how primitives interact in a specific environment.

PPL: Mutation Testing, From Setup to Score [Sprint 2, Week 3]

Wed, 15 Apr 2026 00:00:00 +0700

What I Worked On

Two weeks on mutation testing across the SIRA codebase. The first week wired mutmut (Python) and Stryker (TypeScript) into the pipeline and uncovered the uncomfortable truth: 91% line coverage on the services layer translated to a mutation score just above zero in places. The second week closed that gap by writing 400+ targeted tests across two rounds, driving the API mutation score from ~66% to 80.3%. This blog tells the full arc.

PPL: Quality as a Feedback Loop [Sprint 2, Week 3]

Wed, 15 Apr 2026 00:00:00 +0700

What I Worked On

Two weeks of quality infrastructure: wiring in the tools that measure test quality (week 1), then shortening the feedback loop so that signal actually influences the code being written (week 2). Starting state: 91% line coverage, no mutation testing, integration test coverage missing from SonarQube. Ending state: combined unit + integration coverage in SonarQube, mutmut + Stryker running per-MR with results in the CI comment, API mutation score 80.3%.

PPL: New Learnings Applied to SIRA [Sprint 2, Week 2]

Thu, 09 Apr 2026 00:00:00 +0700

Overview

Sprint 2 Week 2 (April 3 to 9) was dominated by a mutation testing sprint: setting up mutmut and Stryker in CI, building integration test infrastructure from scratch, and writing mutation-killing tests. The week also delivered three application features and a meta-level improvement to AI workflow. Each area required learning at least one technology or concept not covered in standard Fasilkom coursework.

1. Mutation Testing with mutmut (Python)

Mutation testing is a technique where a tool injects small faults (“mutants”) into your source code (flipping operators, changing constants, removing lines) and checks whether your test suite catches each fault. If a test fails, the mutant is “killed.” If no test fails, the mutant “survives,” meaning your tests have a blind spot.

PPL: When 91% Test Coverage Means Nothing

Thu, 09 Apr 2026 00:00:00 +0700

We had 91% line coverage and felt good about it. Then we ran mutation testing and scored 0%. Every line of our service layer was executed by tests, but almost nothing was actually verified. This is the story of how we discovered the gap between “code was run” and “code was checked,” and what we changed to close it.

Note: Our project is hosted on an internal GitLab instance, so we use the term MR (Merge Request) throughout this blog. If you’re coming from GitHub, MRs are the equivalent of Pull Requests (PRs).

PPL: Beyond Unit Tests [Sprint 2, Week 1]

Mon, 23 Mar 2026 00:00:00 +0700

What I Worked On

This week I pushed our testing strategy well beyond standard unit tests. The project already had 433 backend and 200 frontend tests with 91% line coverage, but I wanted to answer a harder question: do our tests actually catch bugs, or do they just execute code?

I added four advanced testing approaches: property-based testing (Hypothesis + fast-check), behavioral testing (pytest-bdd with Gherkin), mutation testing (mutmut + Stryker), and test isolation verification (pytest-randomly). The results were eye-opening.

PPL: From 31 Violations to Zero [Sprint 2, Week 1]

Mon, 23 Mar 2026 00:00:00 +0700

What I Worked On

This week I enforced strict quality gates across the entire CI pipeline. The project previously had allow_failure: true on SonarQube and security scans, meaning violations were reported but never blocked merges. I changed that.

SonarQube: 31 Violations → 0

The Violations

SonarQube flagged 31 issues across the codebase:

1 CRITICAL vulnerability: jwt.get_unverified_header() reading JWT headers without signature verification
3 CRITICAL code smells: duplicated string literals, nested component definitions
27 other issues: unused variables, missing Readonly<> on props, duplicate CSS blocks, array index keys

The Fixes

Backend (3 files): Refactored JWT decode to try HS256 first and fall back to asymmetric on DecodeError, eliminating the unverified header call entirely. Extracted duplicated literals to constants.

PPL: When 91% Test Coverage Means Nothing

Sun, 15 Mar 2026 00:00:00 +0700

We had 91% line coverage and felt good about it. Then we ran mutation testing and scored 0%. Every line of our service layer was executed by tests; almost nothing was actually verified. This is the story of how six advanced testing tools exposed the gap between “code was run” and “code was checked,” and what that means for any team relying on coverage as a quality signal.

Note: Our project is hosted on an internal GitLab instance, so we use the term MR (Merge Request) throughout this blog. If you’re coming from GitHub, MRs are the equivalent of Pull Requests (PRs).

Mutation-Testing on Daffa Abhipraya

PPL: AI at Different Cognitive Distances [Sprint 2, Week 3]

# What I Worked On

PPL: Mutation Testing, From Setup to Score [Sprint 2, Week 3]

# What I Worked On

PPL: Quality as a Feedback Loop [Sprint 2, Week 3]

# What I Worked On

PPL: New Learnings Applied to SIRA [Sprint 2, Week 2]

# Overview

# 1. Mutation Testing with mutmut (Python)

PPL: When 91% Test Coverage Means Nothing

PPL: Beyond Unit Tests [Sprint 2, Week 1]

# What I Worked On

PPL: From 31 Violations to Zero [Sprint 2, Week 1]

# What I Worked On

# SonarQube: 31 Violations → 0

# The Violations

# The Fixes

PPL: When 91% Test Coverage Means Nothing

What I Worked On

What I Worked On

What I Worked On

Overview

1. Mutation Testing with mutmut (Python)

What I Worked On

What I Worked On

SonarQube: 31 Violations → 0

The Violations

The Fixes