An iOS budget app: five rounds of device QA found the same defect every time

The Streak block’s “Check In” button opened the transaction entry form. Not a streak log. Not a confirmation. The same modal you’d tap to record a coffee purchase. The button had been wired to the nearest available handler rather than the right one, and it had shipped that way because in isolation, before anyone ran a real device-QA session, no one had tapped it expecting a streak check-in.

That was Phase N. It was one of five.

The pattern

Phases L through Q were device-QA sweeps on an iOS budget app, the iOS personal finance app in active development. The finding was consistent: UI that presented itself as interactive and wasn’t, or UI that was technically tappable but triggered the wrong thing. Labels users tried to drag. A calendar strip with no tap target. A category rule that stayed on one transaction when it should have propagated to three.

Every phase closed one or more of those gaps.

Phase L: rearrangeable modules

The Home dashboard and Accounts screen both have stacked module layouts: multiple sections arranged vertically, each carrying a distinct data surface. The natural impulse on that kind of screen is to rearrange. Both screens implied that affordance and didn’t deliver it.

Phase L introduced a single drag-to-reorder engine: ModuleOrderStore, an @MainActor ObservableObject singleton in Modules/UI/Navigation, following the same pattern already used by PaletteStore. Injectable into individual screens, so each surface adopts ordering without owning its own persistence logic. App-wide from one implementation.

The tradeoff in using a singleton: one shared order state across screens. That’s the correct call here. Home and Accounts are the only two module-heavy surfaces, and the expected behaviour is that a rearrangement persists across both. Per-screen state would have introduced a second defect: an ordering change on Home that doesn’t hold when the user navigates away and returns.

Phase M: Cursed Energy DNA sweep

The Cursed Energy design DNA, the visual language established on Accounts and Forecast during earlier phases, hadn’t propagated to every screen. Phase M was a catch-up sweep: every laggard section brought up to the baseline those two screens set.

Inconsistent styling is a softer instance of the same defect pattern. A screen that doesn’t look like it belongs to the app creates an impression that its interactions might not behave the same way either. Once every screen runs the same design DNA, the mental model a user builds on Accounts transfers correctly when they navigate elsewhere.

Out of scope for Phase M: any new functionality. Pure visual catch-up.

Phase N: gamification detail screens and direct streak check-in

Device-QA round AnF produced three consecutive findings on the Home gamification blocks. The Quests, Level, and Streak sections were non-interactive labels. Tapping any of them did nothing. The Streak block had a visible “Check In” button; it opened the transaction entry form.

Phase N made all three blocks interactive. Each navigates to a dedicated detail screen. The Streak block’s “Check In” now records the check-in directly. The transaction form is not involved.

The wiring error on the Streak button is the clearest instance of what device-QA round AnF was finding across the board. The button existed. The label was correct. The visual affordance was present. The underlying action was attached to the wrong handler. That category of error doesn’t surface in code review. It surfaces when you tap the thing on a real device and it does the wrong thing.

Phase O: month calendar with full-screen and day mode

The Accounts screen carried a daily-activity strip: a horizontal scroll of date cells indicating transaction density per day. Device-QA observations #4, #7, and #8 all resolved to the same expectation: tap a date, see a calendar; tap a day in that calendar, see that day’s transactions.

The original ask was a single calendar view on tap. The scope deepened mid-design. The shipped flow has three levels: tap the activity strip to open a full-screen month calendar, then tap a specific date to enter day mode. The variant that shipped (Variant B) uses a full-screen calendar presentation rather than an in-line expansion within the Accounts scroll context.

The tradeoff: full-screen means a modal transition. The user leaves Accounts to consult the calendar and returns. The alternative, in-line expansion, would have kept the context but compressed the calendar into a space already carrying the daily strip. For a monthly spending overview, the modal switch is the right call.

Phase Q: smart category propagation

Three transactions from “Driver & Vehicle Licensing Agency” at −£17.06 each. One categorised; two not. The expectation that setting a category on one would apply to the others surfaces from device QA rather than from design. It’s also correct. Manually categorising each instance of a recurring transaction is the kind of work users stop doing. When they stop, the categorisation data degrades until the reports stop being useful.

Phase Q introduced persistent categorisation as a first-class primitive (B-051): smart propagation that recognises matching transactions and applies the category rule forward. The devlog notes this as the app’s first persistent categorisation layer. The mechanism is in place for future category rules to build on. Same-merchant same-amount matching is the first instance; any pattern-based classification the next iteration introduces uses the same layer.

Out of scope for Phase Q: what those future rules look like.

What five phases of this looks like

Each phase resolved a version of the same defect. Home modules looked rearrangeable, but weren’t. Gamification blocks looked interactive; taps did nothing. The Streak button looked purposeful, its handler was wrong. The activity strip implied drill-down with no tap target. The category implied propagation, then stayed local.

Incremental feature shipping produces this. Each piece gets built in isolation, where its own internal logic is sound, and the interaction contracts between a component and a user only become visible when a real finger runs across a real screen. The Quests block was a correct component. The activity strip carried correct data. The categorisation logic worked for the single transaction it was applied to. None of that mattered to the user confronting an interface that implied something it didn’t deliver.

Device QA is the work of finding those implied contracts and closing them. Five phases is what it took. The next round will find new ones.