Red Mutex

Three AI features that paid for themselves in their first quarter

The bar for "AI worked" should be: did this feature pay for its own hosting bill within ninety days. If you can't answer that question, you didn't ship an AI feature. You shipped a slide deck.

We built three AI features for clients in the last year that cleared that bar comfortably. Here's what they were and what made them work.

1. The retrieval-augmented support assistant for an e-commerce client. The client's support team handled around 4,000 customer-facing tickets per month. Roughly 60% were repeat questions answered in their existing help center. We built an internal assistant that retrieved from the help center, drafted a response, and let the agent edit before sending. Result: average handle time dropped 41%, and the team handled the natural growth in ticket volume without backfilling two open headcounts. Cost of the AI feature, fully loaded: about a quarter of one of those headcounts.

Why it worked: the retrieval base was their actual help center, not a fine-tune. The assistant draft went to a human, not to the customer. The metric was time-per-ticket, not "AI quality." All three of those choices are why most internal AI tools fail and this one didn't.

2. The document extraction pipeline for an ERP client. The client onboarded ~600 new vendors per month, each with ~5–7 documents (W-9 equivalents, banking forms, insurance certificates). The ops team manually extracted fields into the ERP. We built an extraction pipeline that produced structured fields with a confidence score, auto-approved over a threshold, and routed the rest to a human queue. Result: 78% of new vendor onboardings touched zero humans for the data-entry step, while error rate on the auto-approved bucket sat below the human baseline.

Why it worked: we treated the model as a probability machine, not a magician. We invested heavily in the evaluation harness before we shipped the feature, so we could keep tuning the auto-approval threshold against real production data. That harness is more important than the model choice.

3. The customer-facing AI feature for a platform client. This one is the boring success: a "summarize this thread" button inside an existing platform. Adoption hit 31% of weekly active users in the first month. The feature didn't directly earn revenue, but engagement and retention on the cohort that used it materially exceeded the cohort that didn't. The platform's CFO was the one who flagged the retention delta, not us.

Why it worked: we shipped it inside the workflow the user was already in. There's no separate "AI tab." There's no marketing page. There's a button that does a useful thing, and it's measurable.

The throughline across all three: the AI feature was scoped against a measurable business outcome, the model was just the implementation detail, and the team in charge could turn it off if it stopped paying for itself.

That's the engagement model we run for AI work. We do a 2-week proof-of-concept with a fixed price and a fixed measurable outcome. If it doesn't hit the outcome, you don't fund the next phase. If it does, we ship it into production.

Book an AI scoping call →

← Back to blog