A missed slot at 09:00 should not silently vanish: building past-due grace into the article pipeline publisher

The schedule_sweep daemon woke with the machine at 09:07. In article_queue, one row had planned_publish_at set to 09:00 that morning. Without a grace policy, that row had two futures: fire immediately despite the missed window, or sit marked scheduled until something explicitly cleared it. The sweep’s next tick found it, checked the timestamp, and had no policy to apply.

That was the exact failure sprint 6 was built to close.

B-schedule in six sprints

B-schedule is the scheduling layer for the article pipeline. It takes a drafted article and decides when it goes live on captainrandom.co.uk.

Sprints 1 through 4 built the input half. Sprint 2 shipped schedule_grammar.py, a pure Python free-text parser. It resolves strings like tomorrow 09:00, +3h, or 2026-06-15 14:00 UTC into a UTC datetime. Sprint 3 wired the Telegram UX: a [📅 schedule] button on each voice-fixed article, expanding into a six-button sub-menu. The buttons are [🚀 now], [⏰ at…], [🪟 next slot], [📥 hold], [🛑 cancel schedule], [« back]. Sprint 3 also added an M2 queue-aware preflight that blocked scheduling when a conflicting window was already occupied. Sprint 4 closed the reply loop: tap ⏰ at…, reply with a free-text string, the listener hands it to schedule_grammar.py, the result lands in article_queue.planned_publish_at.

Sprint 5 built the execution half. A schedule_sweep daemon starts alongside brain_server’s lifespan, sweeps article_queue on a short interval, and fires any row whose slot is due. Before sprint 5, the scheduling data existed. After it, articles actually published.

Sprint 6 is what happens when the machine is asleep.

The gap

schedule_sweep is not a background service running on a server. The editorial pipeline runs on a MacBook. The machine sleeps. The sweep pauses. This is the fundamental constraint the grace policy exists to address.

Without grace, the sweep’s logic is binary: fire if planned_publish_at <= now(), skip otherwise. That handles the 09:07 wake-up fine. The timestamp is past, the condition matches, the article fires. The gap is in the failure paths. If the fire attempt fails (transient error, lock contention, another article already in-flight), the sweep retries on its next tick. But there was no policy on how long to keep retrying, and no policy on what to do when the slot was genuinely too stale to use.

An article scheduled for last Tuesday is not the same thing as one scheduled for seven minutes ago. Publishing a seven-minute-late post is fine. Publishing a six-day-late post is wrong in a different way. The context it was written for may have passed, the news hook may be dead. Without a hard cutoff, the sweep would eventually publish both.

Past-due grace and the 48-hour hard-expire

Sprint 6 introduces two policies working together.

The past-due grace window defines how far past planned_publish_at a row can sit before the sweep stops treating it as recoverable. A slot missed because the machine was asleep for a few hours recovers automatically. The sweep finds the row, checks that it falls within the grace window, and fires it. No intervention required.

The 48-hour hard-expire is the cutoff on the other side. Any row with planned_publish_at more than 48 hours in the past is marked expired rather than fired. The article context, the news hook, the reason the slot was chosen — all of that may have aged past usefulness. Expiring it is a decision, not a failure. The state machine records it as expired, not silently dropped.

That distinction matters more than it sounds. A silently dropped row is a gap in the audit trail: no record, no reason, no path to recovery. An expired row is a state transition. It is logged, inspectable, and recoverable by manual reschedule if the article is still worth publishing. The sweep keeps the history. Fire-and-forget cron does not.

Slot-exhaustion UX

Sprint 6 also covers the case where scheduling fails because the queue is full. M2’s preflight from sprint 3 blocks a new schedule when a conflicting window is already occupied. Sprint 6 adds the recovery flows for when that conflict can’t be automatically resolved.

The slot-exhaustion UX surfaces via Telegram. A message names the blocked slot, gives the reason, and presents the sub-menu: push to the next available slot, hold for manual review, or cancel. The machine can’t decide what to do with a displaced article. Surfacing it as a decision point is correct. Blocking silently is not.

A recoverable state machine

The architectural shift is worth naming explicitly, because it’s the reason six sprints of scaffolding are justified by one policy.

A fire-and-forget cron is a point in time. If it fires, success. If it doesn’t, there’s no record. Nothing to inspect, nothing to retry, nothing to explain. Every missed run disappears into the log void. This is acceptable for jobs where the action is idempotent and stateless: refresh a cache, ping a health endpoint. It is not acceptable for publishing, where the article is unique, the slot was chosen deliberately, and a missed publish has visible consequences.

The article_queue state machine records every transition. queued → drafting → drafted → scheduled → publishing → shipped is auditable. scheduled → expired is auditable. The sweep’s retry loop, the grace window, the hard-expire are all state transitions with reasons attached, not silent disappearances.

Past-due grace is not primarily a usability improvement, although it is that. It’s the policy that makes “missed but recoverable” a legal state rather than an undefined one. Without it, the gap between scheduled and actually published is opaque. With it, every article either publishes, expires with a reason, or surfaces as a Telegram decision. There is no fourth outcome.

What this isn’t

It isn’t a general scheduling library. schedule_grammar.py parses a narrow set of natural-language patterns. The sweep is tightly coupled to article_queue. The grace window and hard-expire are sized to the article pipeline publishing cadence, not configurable per-article. The tradeoff is intentional. If the cadence changes, the constants change; the architecture doesn’t.

B-schedule is also not a server-side scheduler. Running the editorial pipeline on a laptop is a deliberate choice. The constraints that choice introduces (sleep cycles, TCC, keychain access from launchd) have been solved one by one rather than avoided by migrating to a VPS. Sprint 6 closes the last of them in the scheduling layer.

The devlog for B-schedule runs to six entries under the [B-schedule] tag in docs/devlog.md. Each is one paragraph: what shipped, what was descoped, what the failure mode was and how it was addressed. The canonical record is there. This post is the consolidated form.

A missed slot at 09:00 should not silently vanish: building past-due grace into the article pipeline publisher

B-schedule in six sprints

The gap

Past-due grace and the 48-hour hard-expire

Slot-exhaustion UX

A recoverable state machine

What this isn’t

You might also like

Four expired slots and the SLA recovery rule we learned in sequence

I built six sprints of article scheduling. For four of them, nothing fired.

20% in 72 hours: rewriting the slot scheduler's SLA recovery