AI adoptionUse casesLeadership

How to choose your first serious AI use case

The best first AI project is rarely the most futuristic. A five-part test, VALUE, for picking a use case that is frequent, worthwhile and safe to get wrong.

Good Transformer11 June 20267 min read

The most common way to choose a first AI project is to pick the one that sounded most impressive in a demo. A system that reads every contract, an assistant that knows the whole business, an agent that handles the inbox end to end.

It is the wrong instinct, and an expensive one, because the first project does not just succeed or fail on its own terms. It sets what everyone in the organisation comes to believe AI can do.

Choose something ambitious and brittle, and you teach your team that AI overpromises. Choose something useful and well-judged, and you build the appetite and the skill to go further. So the first use case is worth choosing deliberately, against criteria, rather than by whichever idea everyone in the meeting is most excited about.

Why the futuristic option usually disappoints

There is good evidence that where you point AI matters more than whether you use it. In a field experiment with 758 Boston Consulting Group consultants, Harvard and MIT researchers found that on tasks well suited to the technology, consultants using GPT-4 completed 12.2% more tasks, 25.1% faster, at higher quality. On a task deliberately chosen to sit just outside what the model did well, the same tool made them 19% less likely to reach the right answer. Ethan Mollick, a co-author of that study, coined the term "jagged frontier" for this: capability that is excellent on one task and quietly poor on a neighbouring one that looks similar.

For a leader, the lesson is direct. The futuristic project tends to sit on the wrong side of that frontier, where the work is complex, the right answer is contested, and a confident-but-wrong system does real damage. The unglamorous project, the one you almost dismissed, often sits comfortably on the right side.

The frequency of the task matters too. In a separate study of 5,179 customer-support agents, generative AI raised productivity by 14% on average, and around 34% for the least experienced staff. The gains were largest where the task recurred constantly and where the AI could carry good practice from strong performers to weaker ones. A task you do twenty times a day is a better first bet than one you do twice a year, both because the total benefit is larger and because your team gets twenty chances a day to learn.

The VALUE test

When a leader asks us how to choose, we run candidate tasks through five questions. The first letters spell VALUE, which is the point: you are looking for the use case where the effort pays off.

Value. If this works, what is it actually worth? Be concrete: hours returned to fee-earning work, faster turnaround a client would notice, errors avoided. If you cannot describe the prize in a sentence, it is not your first project.

Activity. How often does the task happen? High-frequency work compounds: more total benefit, and far more repetitions for the team to build judgement. A daily task beats a quarterly one even if the quarterly one feels weightier.

Learnability. Can your people tell good output from bad, and get better at producing it? AI raises quality fastest where "good" is knowable and checkable. Where quality is a matter of fine professional judgement that takes years to acquire, expect slower, more supervised gains.

Uncertainty. How sure are you it will work, and how bad is it if it does not? This is the jagged frontier in practical form. Favour tasks that sit inside the technology's strengths and where a wrong output is caught easily and costs little. Save the high-uncertainty, high-consequence work for when you have learned more.

Ease. How hard is it to actually stand up? Consider the tools you already have, the data the task needs, who has to change what they do, and whether anyone has to grant access to sensitive systems. Ease is not everything, but a first project should not also be an IT project.

Pick the frequent, checkable, low-stakes task. That is where AI pays off.

What this looks like in practice

Take an accountancy firm weighing two ideas. The first is an "AI audit assistant" that reviews client records and flags issues. The second is a humble helper that turns messy meeting notes into structured client summaries and first-draft follow-up emails. The audit assistant scores high on Value but poorly on Uncertainty and Learnability: the consequences of a missed issue are severe and the judgement is hard to check at a glance. The summary helper is less exciting, but it is frequent, easy to verify, low-stakes, and runs on tools the firm already has. As a first serious use case, the second wins on VALUE, and it builds the confidence to attempt the first later. (Both are illustrative examples, not a specific firm.)

The pattern repeats across professional services. A recruitment agency is usually better starting with structured candidate summaries than with autonomous candidate outreach. A marketing agency is better starting with research synthesis and first-draft briefs than with fully automated client reporting. The boring choice is frequently the right one.

The honest limits

Two cautions. First, "safe to get wrong" must not collapse into "not worth doing". A first project that no one cares about teaches the team that AI is a toy. The Value test is there precisely to stop you choosing something trivial. Aim for a task that is genuinely useful and forgiving, not one that is merely forgiving.

Second, the frontier moves. A task that sits outside the technology's reliable range today may sit inside it in six months, and a scorecard is a snapshot, not a verdict for all time. Treat your VALUE scores as a decision you revisit, not a label you fix. Time saved on a task is not the same as value to the business, a distinction covered in the next piece in this series.

What to do next

List up to five candidate tasks, drawn from work your team actually does rather than from a vendor's feature list. Score each from one to five on Value, Activity, Learnability, Uncertainty and Ease. Be suspicious of anything that scores high only on Value and low on everything else. That is the seductive, brittle project. Pick the highest total with a manageable downside, and commit to it as a real piece of work with an owner and a way to measure it.

This is the disciplined version of the "scan" in the government's "scan, pilot, scale" approach to AI adoption, and of what the NIST AI Risk Management Framework calls "Map": understand a use case in its context before you build anything. Choosing well is unglamorous, and it determines most of the return.

The tool

To make that comparison properly, we have built the AI Use-Case VALUE Scorecard: a worksheet that scores up to five candidate use cases across all five criteria, with guidance on what each score means, warning thresholds that flag a brittle choice, and a short recommendation section to capture the decision.

Download the AI Use-Case VALUE Scorecard (PDF)

Choosing well is also the heart of the AI Reality Check Sprint: sorting the candidate use cases, scoring them honestly, and leaving the team with a defensible order to work through. It builds naturally on knowing what to hand to AI in the first place, and on the durable AI literacy that lets a leader judge these calls without waiting for a demo.

Sources and further reading

Dell'Acqua et al., Navigating the Jagged Technological Frontier, Harvard Business School / BCG working paper, 2023. Independent field experiment. Source for the within-frontier gains and the 19% drop on an out-of-frontier task.
Brynjolfsson, Li and Raymond, Generative AI at Work, NBER Working Paper 31161, 2023. Independent. Source for the 14% average and 34% novice productivity gains in customer support.
UK Government, AI Opportunities Action Plan, January 2025. Source for the "Scan, Pilot, Scale" sequencing of adoption.
NIST AI Risk Management Framework. Independent. Its "Map" function is the discipline of understanding a use case in its context before building.

Work with Good Transformer

Turn this thinking into working practice.

Explore team advisory