The number that started this was 8–13.

That’s the em-dash count per thousand words across the four published pieces on this site, measured during a routine devlog audit on 22 May. Normal edited prose runs 1–3 per thousand. The gap is not subtle. My writing advisor flagged the density; I pulled the articles, counted manually, and confirmed it. The four pieces weren’t stylistically heavy on em-dashes. They were statistically anomalous in a way that had a specific cause: LLM-assisted drafting leaves a fingerprint, and I’d been publishing without checking for it.

That was the moment the problem became concrete enough to solve properly.

What “AI-prose tells” means here

An AI-prose tell is a surface feature that trained language models produce at rates statistically higher than edited human prose. Not because the model is doing something wrong (the outputs are grammatically correct, often fluent), but because the model is optimising for plausibility and coherence, and certain constructions are plausible and coherent at a frequency that human writers don’t naturally hit.

Em-dash density is one. The others follow a pattern: hedging qualifiers (“it’s worth noting”, “importantly”), transitional throat-clearing (“in order to”, “with that in mind”), performative warmth at section opens (“great question”, “absolutely”), false continuity markers (“of course”, “naturally”), and a handful more. Each one is borderline in isolation. At the density a generative draft produces them, the cumulative signal is detectable.

The problem I had before 22 May was that I was catching these by feel, on re-read, inconsistently. That’s not a process. It’s a mood.

Codifying the list

The first step was getting the tells out of my head and into a canonical document. STYLE-GUIDE.md now lives in the repo root. It has ten numbered items. Each item specifies:

  • The pattern (what to look for)
  • Why it reads as generated (the mechanism, not just the label)
  • The threshold, where one exists (em-dashes: flag above 3 per 1,000 words)
  • The fix (usually: delete, or restructure the sentence so the construction isn’t needed)

That last column matters. A style guide that names problems without naming remedies is documentation for its own sake. The em-dash entry, for instance, doesn’t say “use fewer em-dashes.” It says: if you’re reaching for an em-dash, check whether the parenthetical is load-bearing. If it isn’t, cut the aside entirely. If it is, rewrite the sentence so the aside is the main clause.

Before STYLE-GUIDE.md existed, the em-dash guidance lived inline in dvlaw_draft.md as a one-line note. That note wasn’t findable at publish time. Moving it into a canonical reference that the toolchain can point at is the difference between a note-to-self and a constraint.

Making it mechanical

A document you read before publishing is better than nothing. A script that runs before publishing and fails loudly is better than a document.

audit-ai-tells.sh is the script. Given a file path, it counts pattern instances against the word count. It exits non-zero on any breached threshold. The output is a line per violation: pattern name, count, threshold, word count, and the offending lines as context.

[audit-ai-tells] em-dash density: 11.2 per 1k words (threshold: 3)
  → line 14: "The reflex I'd been using to review code — does this function…"
  → line 31: "All inferred, all shipped without changes — most of them right."
  …
[audit-ai-tells] hedging: 4 instances
  → line 22: "it's worth noting that the original version"
  → line 67: "importantly, this only applies when"
  …
[audit-ai-tells] FAIL (2 patterns exceeded threshold)

The exit code integrates cleanly with whatever runs before a commit. It currently sits in the pre-publish checklist step of the DVLAW workflow, the same system that tracks article proposals, events, and devlog entries in SQLite. Non-zero exit blocks the publish command.

Two things the script does not do. It doesn’t rewrite the prose. Automated rewrites on flagged sentences produce their own tells, often subtler ones, so the rewrite step stays manual by design. And it doesn’t catch semantic tells: fabricated specifics, generic marketing assertions, overcommitted promises. Those require the existence-verification pass I described in the Claude Code audit article. The script is for surface-pattern density. Out of scope: meaning-level accuracy.

The threshold question

Setting a threshold for a stylistic signal is not an exact science. The 1–3 per thousand figure for em-dashes comes from the devlog audit comparing the published articles against the surrounding range for edited magazine and longform web prose. It’s a working figure, not a derived constant. If the threshold is too tight, the script flags articles that read fine and creates friction without value. If it’s too loose, it misses the problem it was built to catch.

The current thresholds were calibrated against the four existing articles and a handful of pieces I’d written without LLM assistance. The articles that felt clean passed; the ones that had needed post-draft editing flagged. That’s the calibration test I used. It will need revisiting if the writing volume scales.

What changed after the first run

Running audit-ai-tells.sh against the four published articles in retrospect confirmed the devlog audit numbers. It also surfaced two patterns I hadn’t noticed on re-read: a cluster of “of course” and “naturally” constructions in the second article, and a run of “in order to” phrases that could all be shortened to “to.” Both had survived because they’re quiet. Neither is wrong in isolation. At three instances each across a 900-word piece, the density is a signal.

The published articles aren’t being retroactively edited. The devlog entry is the record of what was there; the STYLE-GUIDE is the constraint going forward. Retroactive edits to published pieces create a different kind of trust problem. The record stops being reliable.

Why this is a DVLAW concern

DVLAW is the system I use to track build decisions, article proposals, and devlog entries across the repos on this machine. The 22 May devlog entries that produced this work also added two SQLite tables (proposals and events) so the article-creation workflow has a durable audit trail instead of evaporating at session close. Before that change, a proposal that didn’t make it to publish had no record. Now there’s a row.

The audit-ai-tells.sh script fits into DVLAW as a pre-publish gate: it runs, logs its result to the events table with a timestamp and a pass/fail status, and the article doesn’t move to the published state until the gate passes. The audit trail records when something was flagged and when it was cleared. That matters for the same reason the existence-verification pass matters: the record needs to be trustworthy, which means it needs to capture the things that almost shipped as well as the things that did.

What this doesn’t solve

The script catches density. It doesn’t catch a single well-placed “it’s worth noting” that genuinely serves the sentence, and it doesn’t catch the absence of voice that comes from over-editing. There is a version of this process where the tells are scrubbed so aggressively that the prose becomes flat. That’s a different problem, and a harder one.

The other thing it doesn’t solve is the initial draft. audit-ai-tells.sh is a post-draft check. If the draft comes out of an LLM-assisted session with twelve em-dashes per thousand words, the script surfaces that; it doesn’t prevent it. The preventive version is a prompt-time constraint: tell the agent which patterns to avoid before the draft exists. I haven’t standardised that yet. The STYLE-GUIDE reference is close enough for now — the constraint is one copy-paste away.

The receipt

The devlog entries for this work are dated 22–23 May in docs/devlog.md, tagged dvlaw. The canonical list of tells is in STYLE-GUIDE.md at the repo root. The script is audit-ai-tells.sh. If you’re running a similar workflow and you want the specific threshold figures, they’re in the script comments.

The em-dash count in this article, measured before the final pass: 6. That’s roughly 3.8 per thousand words. The script flagged it. I cut six down to one, keeping the one above because it earned the beat. The final count is below threshold. That’s the process working as intended.

All writing