The merge tap happened at 2026-05-28. Sonnet had drafted an article titled “The backtick that could run anything: hardening AppleScript shell…” — a colon-space sitting right in the middle of the YAML value, unquoted. The frontmatter parser didn't complain at write time. It failed at read time, when the pipeline tried to consume the file downstream. The colon is a YAML key-value delimiter. The title had become two fields. Neither was the title.
That was the first failure.
The second came thirty seconds later. CJ tapped merge immediately after publish. The bash polling loop ran gh pr checks 38. GitHub Actions hadn't dispatched yet. gh returned exit code 1 with empty stdout — no checks running, no checks pending, just nothing — and the loop interpreted silence as failure. It bailed.
Two bugs. Neither had appeared in any prior test run, because no prior test run had used a real Sonnet-generated title and no prior test run had tapped merge that fast. Both had been present in the code for weeks.
The context
dvlaw is the pipeline that sits behind this site's publishing workflow. A draft enters, automated stages process it, the .mdx file lands in the repo. By B8b, the pipeline could receive a ship instruction via Telegram, run the full chain, open a PR, and notify back. B8b was the first time all of that happened in sequence, live, against a real article.
Integration tests are not the same as first ship. This is the receipt for learning that again.
Bug one: YAML frontmatter and the colon it can't hold
The drafter stage — Sonnet, generating the article content — had no constraint on title format. Sonnet writes good titles. It also writes titles with colons, because colons are natural punctuation in English prose. “The backtick that could run anything: hardening AppleScript shell” is a reasonable title. It is not a reasonable unquoted YAML string.
Unquoted YAML interprets the first colon-space sequence in a scalar as a key separator. The frontmatter block for that article would have parsed as something like:
title: The backtick that could run anything
hardening AppleScript shell…: ~
The second field is junk. The title is truncated. Anything downstream consuming title gets half a sentence.
The fix is tagged B4-yaml-quote-title. It normalises frontmatter at write time rather than relying on the author (human or model) to quote correctly. String fields that need quoting get it before the file is committed. Title and excerpt are the two fields most likely to contain colons or em-dashes; both are now always emitted double-quoted.
The trade-off: double-quoting everything means any literal double-quote inside a title needs escaping. Acceptable. Unquoted colons failing silently at runtime is not.
Out of scope: validating every YAML field type. The fix targets the fields the drafter generates. Schema validation for the full frontmatter block is a separate concern.
Bug two: CI hasn't started yet
The merge bash loop had one job: poll gh pr checks <PR number> until all checks passed, then proceed. The assumption baked into it was that by the time a human tapped merge, GitHub Actions had already dispatched. That assumption held in every manual test, because manual tests involve reading, thinking, scrolling — ten to thirty seconds of latency between PR open and merge tap.
B8b eliminated that latency. The pipeline opened the PR and notified via Telegram in one step. CJ tapped merge from the notification. The round-trip was fast enough that gh pr checks 38 ran before Actions had queued anything.
gh pr checks with no checks present returns exit code 1 and empty stdout. The polling loop's logic treated non-zero exit as “checks failed” and bailed. It had no handling for the “checks haven't started” state, because that state had never been observed before.
The fix — tagged B8c-merge-bash-no-checks — adds a pre-poll wait and distinguishes between empty stdout (CI not started) and non-zero exit with actual check results (CI failed). If stdout is empty on the first poll, the loop sleeps and retries rather than treating absence as failure. The distinction is not complicated. It required observing the failure mode to know it needed making.
The trade-off: the added wait extends the polling loop's minimum runtime. Acceptable. The alternative is a false failure on every fast merge.
What was shipping while these bugs existed
The B4 voice-fix work landed on the same day. Two components:
B4-voice-fix-agent: an Opus 4.7 rewrite layer that takes the drafter's raw.mdxoutput and rewrites AI tells against the voice profile as a positive constraint. This is the second consumer of the native agent harness; the first was the thesis drafter.B4-voice-fix-auto: chains the Opus rewrite automatically after the drafter, rather than leaving it as an opt-in CLI invocation.
The auto-chain was added after a specific gap was caught: the drafter completed and pushed the raw .mdx to Telegram, but no voice-fix landed, because the agent had been scoped as opt-in. The rewrite stage was there; it just wasn't wired in. B4-voice-fix-auto closes that gap.
B4-voice-fix-cli exposes the regex scrubber and the Opus rewrite as a user-facing surface: dvlaw_voice_fix.py <file.mdx>. The CLI runs the mechanical AI-tell scrubber. It writes a .tells.txt sidecar, prints a preview of up to five flags, and can chain into the Opus rewrite. It exists as a standalone tool for cases where the full pipeline isn't running — manual drafts, imported content, one-off passes.
These were the features. The bugs were invisible beneath them, waiting for the first real run.
The pattern
Both failures share a structure. An assumption held silently in the code, never tested against the condition that would break it. The YAML normalisation assumed the model wouldn't produce syntax that needed escaping. The polling loop assumed CI would be running when it asked. Both assumptions were reasonable given the tests that existed. Neither was reasonable given the full envelope of real use.
The version of these bugs that I should have caught earlier is: any time a pipeline receives input from an external source — a model, a human, a third-party API — the failure modes at the boundary are the ones worth testing deliberately. The model will emit a colon-space eventually. GitHub Actions will not have started yet on a fast merge. Those aren't edge cases; they're the natural behaviour of the systems involved.
I caught them on first ship instead of during development. That's the more expensive way.
What changed
Three concrete things:
- Frontmatter string fields that contain or could contain YAML-significant characters are now quoted at write time. The normalisation runs before commit.
- The merge polling loop distinguishes empty stdout from failed checks and retries on the former.
- The voice-fix Opus rewrite is now chained automatically after the drafter, not left as a CLI-only opt-in.
The devlog entries for 2026-05-28 and 2026-05-29 are the canonical record. This post is the consolidated form, with the pattern named.
The pipeline is running. The next failure will be something else the tests didn't cover.