[S4, W1] PPL: Two Reviews, Two Shapes: Diagnostic Unblock and Multi-Agent Depth Pass

What I Worked On

Two substantive review threads this week, both on MRs authored by Fadhli (@fadhliraihan). Different shapes:

!268 (Sira-318 stale duplicate client cache): a diagnostic comment that unblocked his CI by explaining an environmental bug (supabase CLI version drift) that had nothing to do with his code. Short, surgical.
!304 (SIRA-319 atomic duplicate client merge): a 13-finding depth pass on an atomic RPC with row locks. Dispatched 6 specialized review agents in parallel, aggregated findings, posted one consolidated threaded discussion. Long, structural.

Both reviews used the same target author intentionally. Fadhli was running a multi-MR stretch on the duplicate-detection domain across two weeks (cache, then merge atomicity); reviewing both in sequence let me carry context forward instead of context-switching.

MR !268: The Comment That Unblocked

Fadhli’s MR was a legitimate fix for a stale-duplicate-client cache bug in the API. His CI was red for reasons that looked like his code at first glance: the integration-test job kept failing with Error response from daemon: ... failed commit on ref during a Docker image pull, after 16 minutes of timeout. The CI report on his MR was confusing because the bot reported the failure as a generic build error.

I’d hit the same bug on my own branch earlier in the week. The root cause was a supabase CLI version drift: npx supabase start was resolving npm’s latest differently between jobs, so the runner’s containerd cache had postgres:17.6.1.106 but the new CLI wanted 17.6.1.095. Pulling the new image hit a containerd ingest race and the job died.

I’d fixed it on main with commit a4f0c61e (pin the supabase CLI to v2.98.2 in scripts/ci-supabase-slot.sh). The fix was a one-liner. But Fadhli’s branch was off an older main, so his CI was still hitting the bug.

The review comment I posted:

hey @fadhliraihan, your ci was failing because of a supabase cli version drift bug. npx supabase start was resolving npm’s latest differently between jobs, so this branch’s run installed a cli that wanted postgres:17.6.1.095 while the nashta runner only has 17.6.1.106 cached. tried to pull and hit a containerd ingest race, job died after 16 min.
fixed on main with commit a4f0c61e (pinned the cli to v2.98.2 in scripts/ci-supabase-slot.sh). verified clean on pipeline 16318, integration test back to ~4.7 min with all images cached.
to unblock: rebase your branch onto main and push, ci should pass cleanly. lmk if it still misbehaves.

Three things this comment does that a generic “your CI is failing” comment wouldn’t:

It names the root cause precisely. Not “Docker is broken,” not “rebase and try again,” but the exact mechanism: npx supabase start non-deterministically resolving latest between jobs, the runner’s containerd image cache having one tag, the new CLI wanting another, the containerd ingest race that killed the job. A reviewer who only says “rebase” leaves the author guessing whether the fix actually addresses their failure.

It cites the fix and the verification. Commit a4f0c61e is the exact line, scripts/ci-supabase-slot.sh is the exact file, pipeline 16318 is the proof. Fadhli can look up the diff, read the change, and decide whether he believes the fix applies to his case.

It gives a concrete unblock action. “Rebase your branch onto main and push, ci should pass cleanly” is one command. The reviewer who instead writes “this might be a flaky test, just retry” leaves the author in the same broken state, just with a wasted CI minute.

The point of code review isn’t “find bugs in the diff.” It’s “help the MR get to a mergeable state.” When the diff is correct but the CI is broken for an unrelated reason, the right review action is to diagnose the unrelated reason and unblock.

MR !304: 13 Findings Through Multi-Agent Review

MR !304 (SIRA-319) replaced a non-atomic three-step merge (reassign_invoices then reassign_reminder_logs then reassign_risk_scoring_logs then delete_client) with a single PostgreSQL RPC that does all four mutations inside one transaction with FOR UPDATE row locks. The architectural direction was clearly right; the question was whether the implementation actually delivered the atomicity claim and held up under concurrency and error paths.

I ran six specialized review agents in parallel against the diff, each with a focused charter:

code review: general bugs, CLAUDE.md compliance, concurrency hazards
test coverage: did the new tests actually verify the atomicity claim
silent failure hunt: error paths, exception propagation, swallowed errors
type design: argument-order safety, primitive obsession on UUIDs
comment analysis: comment accuracy and rotting documentation
simplify: simplification opportunities preserving full functionality

I then aggregated, deduplicated cross-agent findings (e.g. the deadlock risk was flagged by both code and silent-failure agents from different angles), and posted one consolidated review thread on the MR with positives section first, then P0/P1/P2/P3 severity grouping. Every finding had a code-snippet fix attached.

The thirteen findings broke down as:

P0 (3): deadlock risk on concurrent merges from non-deterministic lock order, PL/pgSQL raise exception messages downgraded to generic 500 because the router only catches ValueError, zero integration tests for the atomic RPC behavior despite atomicity being the whole point.
P1 (4): same-type str arg swap risk on merge_duplicate_clients(discard, keep), TOCTOU between active-pair check and RPC call, post-merge get_client failure masking a successful merge as 500, tests patching a private method which couples them to implementation.
P2 (4): is_dismissed_pair should query by pair_id PK instead of two filter columns, redundant _encryption_key field, SIRA ticket prefix violating the “don’t reference current task” rule, missing audit logs on refusals.
P3 (2): redundant drop function if exists, threshold=80 magic number.

The P0 about lock ordering is the highest-impact finding. The original SQL locked keep then discard, which deadlocks if two admins concurrently merge the same pair from opposite directions (admin A: keep=X discard=Y; admin B: keep=Y discard=X). Postgres aborts one with deadlock detected which surfaces as a 500. The fix is to always lock in deterministic id order:

perform 1 from clients
where id in (p_keep_client_id, p_discard_client_id)
order by id
for update;

The P0 about exception downgrade is structural too. The PL/pgSQL function carefully writes useful messages (Cannot merge a client into itself, Keep client not found), but db.rpc().execute() raises postgrest.exceptions.APIError, not ValueError. The router only catches ValueError, so every PL/pgSQL guard becomes a generic 500. The Python wrapper needs to catch APIError and translate predictable messages back to ValueError so the router can map them cleanly.

Why Multi-Agent Beats Solo Reviewing

A single human reviewer reading MR !304 would have caught maybe half of those thirteen findings on a careful pass. Different agents specialize on different failure modes that a generalist eye routinely misses:

The silent-failure agent caught the APIError to ValueError translation gap because that’s exactly what it’s tuned to find. A general reviewer who’s never been bitten by postgrest.exceptions would miss it.
The type-design agent caught the same-type str arg swap risk because that’s a pure type-shape concern, not a logic bug.
The test-coverage agent caught the “zero integration test for atomicity” gap because it’s grading the test suite against the project’s stated philosophy, not just reading the new test names.
The simplify agent caught the is_dismissed_pair(client_a_id, client_b_id) redundancy because it traced data flow back to the call site where pair_id is already available.

Running them in parallel kept review wall-time short. The cross-agent dedup step is where most of the human work lived: deciding which of two slightly different framings of the same finding to keep, and merging overlapping suggestions.

What I Learned

Two patterns from this week’s reviews:

Two reviews of different shapes serve different purposes. !268 was a 5-minute review that saved Fadhli an hour of debugging. !304 was a 90-minute review that potentially saved the team from a production deadlock on concurrent admin actions. Neither shape would have substituted for the other.

Multi-agent review scales depth, not shallowness. Dispatching six specialized agents in parallel gave me depth comparable to a 3-hour senior reviewer pass without that wall-clock cost. The bottleneck became aggregation and dedup, not finding the issues. For high-stakes MRs (anything that touches concurrency, atomicity, or destructive operations), this is the workflow I’ll keep using.

Evidence

MR !268 Fix stale duplicate client cache: diagnostic review comment on supabase CLI version drift, posted as discussion thread
MR !304 SIRA-319 Fix invalid and non-atomic duplicate client merge: 13-finding consolidated review (3 P0, 4 P1, 4 P2, 2 P3), posted as threaded discussion
Direct commit a4f0c61e on main: fix(ci): pin supabase CLI version to prevent postgres image tag drift (the fix cited in the !268 review)
Pipeline 16318 (verification cited in the !268 review)
Six review agents dispatched in parallel for !304: code-reviewer, pr-test-analyzer, silent-failure-hunter, type-design-analyzer, comment-analyzer, code-simplifier

~/abhipraya

# What I Worked On

# MR !268: The Comment That Unblocked

# MR !304: 13 Findings Through Multi-Agent Review

# Why Multi-Agent Beats Solo Reviewing

# What I Learned

# Evidence

Related Posts