From AI pilot to production: go/no-go criteria and concrete steps
Moving from pilot to production is the moment an AI project leaves the controlled environment of a test and enters the real workflow, with real users, real data, and operational responsibility. Most AI pilots do not fail at the demo; they fail here: without decision criteria, without an owner, and without an operations plan, the pilot remains a permanent experiment.
The classic scenario: the pilot looks good, everyone is pleased at the presentation, then three months pass and nothing has changed in the real workflow. It is not a technology problem but a decision problem: nobody established what "it works" means, who signs off on moving forward, and what must exist for the system to run on a Monday morning.
Why AI pilots die in the demo
- Success was not defined in advance: without a written threshold ("cut processing time by 30%", "at least 95% accuracy on X cases"), any result is open to interpretation and the decision gets postponed.
- The pilot ran on clean data: demo on hand-picked cases, production on reality. The gap between the two is exactly your risk.
- There is no owner: the pilot belongs to "the innovation team", production belongs to nobody. Systems without an accountable owner never leave the toy stage.
- Operations were never designed: who intervenes when answers degrade? Who pays for the infrastructure? Who trains new users?
- The undeclared fear: a successful pilot creates obligations. Without a clear road forward, the organization prefers the comfort of "let's test some more".
The go/no-go criteria, written before the pilot
The minimum list you sign before starting:
- The target result: the indicator, the value, the measurement period.
- The quality threshold: accuracy, the acceptable error rate, and the case set it is measured on, hard cases included.
- The acceptable cost per transaction: the pilot shows you the real cost per use; decide in advance how much is too much.
- The stop conditions: which result means a firm no-go, so you do not prolong the agony.
- Who decides: one person, with the decision date in the calendar.
A pilot without these five points is not a pilot; it is a demo with a budget.
The data: the real test
The question that separates pilot from production: has the system seen your ugly data? Badly scanned documents, ambiguous requests, rare cases, mixed language. Before the go, check:
- the test sample covers the real distribution of cases, not a favorable selection;
- you know what the system does when it does not know: does it say "I don't know" or does it invent?
- the production data flow is sustainable: who updates the sources, how often, with what access rights.
Risks and governance before scale
Production means accountability, and this is where the operating discipline required by the EU AI Act comes in:
- Assigned human oversight: who can intervene and stop the system, with what competence.
- Logs and traceability: can you reconstruct why the system gave a particular answer?
- A fallback plan: what happens when the system is unavailable or wrong? The old process must stay functional as a safety net.
- Quality monitoring: degradation is silent; without continuous measurement you find out from customers.
- Risk classification: if the system falls into the high-risk category, the documentation and operating obligations grow; you want to learn that before scaling, not after.
Who answers on Monday morning
The move to production is complete when there is a name for each of the questions below:
- Who owns the system's business result?
- Who operates it technically: monitoring, incidents, updates?
- Who trains users and collects feedback?
- Who decides changes: prompts, thresholds, vendor?
Four questions, possibly the same name at the start, but written down. An AI system in production without an owner is an incident that has not happened yet.
The first 90 days in production
The go is not the finish line; it is the start of the period when the system earns its place:
- Weeks 1-2: intensive monitoring, with the old process still running in parallel for the sensitive cases. Every incident gets documented, not talked away.
- Weeks 3-6: compare the real indicators against the go/no-go thresholds. Differences are discussed with numbers: where quality degraded, which case types slip through, what it really costs per transaction.
- Weeks 7-12: the expansion decision. More teams, more workflows, or more volume, one at a time, with the same measurement discipline. This is also when you set the permanent rhythm: a monthly review of quality and costs, with the business owner in the room.
The simple rule for the whole period: any change of model, prompt, or threshold is treated as a mini-release, with a test before and measurement after. AI systems do not break loudly; they degrade quietly, and the measurement routine is the only thing that catches it in time.
Pilot → production checklist
- The go/no-go criteria were written and signed before the pilot.
- The test included the hard cases, not just the happy path.
- The real cost per transaction is known and accepted.
- The business owner and the technical operator have names.
- Human oversight is assigned and competent.
- Logs, monitoring, and the fallback plan exist.
- The risk classification and its obligations are clarified.
- Users are trained, and the old process remains the safety net during the transition.
- The go decision is made by the agreed person, on the agreed date.
If you are earlier in the process and still choosing the solution, start with build, buy, or integrate.
FAQ
How long should an AI pilot take?
Long enough to cover the real variation of cases and measure the target indicator: for many workflows, 4-8 weeks of real usage. A pilot without a deadline is a project without a decision; put the go/no-go meeting date in the calendar on day one.
Which KPIs define a pilot's success?
Three families: business result (time saved, cases resolved, conversion), quality (accuracy on the evaluation set, escalation rate to a human), and cost (per transaction, per user). Choose few, measurable, and set before the start.
The pilot works. Why is the organization hesitating?
Because production changes responsibilities: someone has to sign for the result, the budget, and the risks. The hesitation is rarely technical; it is the absence of an owner and a clear road. The answer is a decision, not another test.
What team do we need for production?
Minimum: a business owner, someone who operates the system technically (in-house or a partner), and assigned human oversight for the sensitive decisions. Volume determines whether those are three roles or three people; clarity of responsibility matters more than headcount.
When is stopping a pilot the right call?
When it hits the stop conditions written in advance: results below threshold on real data, unacceptable cost per transaction, or risk that cannot be covered. A documented no-go is a good outcome: you learned cheaply what does not work and can redirect the budget.