AI governanceAI riskAI adoptionLeadership

How to stop AI mistakes reaching your clients

Two respected firms let AI-invented facts reach a court and a client. The verification step that should have caught them failed, was not followed, or was never clearly attached to the AI-assisted work.

Good Transformer27 June 20268 min read

In April 2026 Sullivan & Cromwell, one of the most respected law firms in the world, apologised to a US bankruptcy judge after a motion it filed cited cases that did not exist. The fabricated citations had come from AI, and they were spotted not by the firm but by the lawyers on the other side. Two things are worth separating here. Models invent facts, confidently and in the right format; that much is known. What mattered was the verification step that should have caught the error before anything left the building. As more of your firm's work passes through AI, the most useful move a leader can make is to find where AI output now reaches a client, and put a deliberate human read back in front of it.

This is general guidance on process, not legal or compliance advice. Where a duty in your field genuinely bites, take proper advice on it.

Two firms, the same failure

The Sullivan & Cromwell filing is not an isolated lapse by a fringe operator. It is the opposite: an elite firm caught out by exactly the failure that is now within reach of every firm using AI for real work.

The pattern repeats outside the law. In 2025 Deloitte's Australian arm delivered a report to a government department that contained fabricated academic references and a made-up quote from a federal court judgment, and issued a partial refund once the errors came to light. As with the law firm, the mistakes were found by an outside reader, an academic who recognised that a book the report cited did not exist. The revised version added a line disclosing that a generative AI system had been used.

In both cases, invented facts left the organisation because the verification process either failed, was not followed, or did not reach the facts AI had inserted. That is the part worth a leader's attention. The model doing something unreliable is a given. The verification step that should have caught the errors either failed, was not followed, or was never clearly attached to the AI-assisted work.

Why the check disappears

Three things make this failure easy to walk into, and they are worth naming because each calls for a different response.

The first is that AI output looks finished. A fabricated citation is formatted exactly like a real one. The rough edges that used to signal a rushed or junior draft, the gaps, the hedging, the bits clearly left to fill in, are gone. Fluent and correct look identical on the page, so the usual instinct to give a ragged draft a second read never fires.

The second is that you cannot tell from the quality of the writing whether the facts underneath it hold. Ethan Mollick's phrase for this is the jagged frontier: AI is brilliant at some tasks and confidently wrong on others, with no clean line between the two that a user can see. A model that drafts a flawless paragraph of analysis may invent the case it cites two sentences later, and nothing in the tone warns you which is which.

The third is the most ordinary, and the most preventable. The old check was attached to slow human work, and it was removed along with the slowness. When a trainee spent a day finding the relevant cases, a partner read them before the document went out, because the trainee might have missed something. When AI produces the same list in seconds, the reading step quietly drops out with the labour it used to accompany. The work feels done, so the check feels redundant. Awareness is high. McKinsey's 2026 survey of people responsible for AI governance found inaccuracy was the most frequently cited AI risk, named by 74% of respondents responsible for AI governance, risk or investment decisions. Far fewer have put anything in place to catch it.

Put the human check back in

The fix is not a new tool or a policy document. It is a named step, run by a person, before AI-assisted work reaches anyone outside the firm. Call it the last read. Four things make it work.

Name the places it matters. Walk through where AI output actually reaches a client or a third party: the citations in a filing, the figures in a first-pass diligence note, the named facts in a candidate summary, the numbers in a client email. These are the points where an invented detail does damage. Most firms have never listed them, which is why the check has nowhere to attach.

Verify the checkable things against their source, not against the model. Every citation, figure, name, date and quote in client-facing work should trace to a real document a person has actually looked at. The question is never "does this read as right", it is "where did this come from, and have we seen it". A made-up case fails that test instantly; a fluent paragraph cannot talk you out of it.

Keep one named person answerable for the sign-off. The work can be drafted by AI and prepared by anyone, but a specific human puts their name to it before it goes. Responsibility does not pass to a model, and a firm that is clear about who owns the final read is far less likely to let it slide.

Make it proportional. A client-facing filing or a board report earns a full check. An internal first draft does not. The point is not to re-read everything AI touches, which would hand back all the time it saved. The point is to be deliberate about the small number of outputs where an error is expensive, and relaxed everywhere else.

This is not a case against using AI

None of this argues for keeping AI away from skilled work. The point is narrower: when AI contributes to research, drafting or factual material, the checkable parts still need human verification before they leave the firm. Used on the right tasks AI earns its place easily, and a firm that refuses it on the grounds that it can err will simply be slower than the one that uses it and checks it. The discipline here is narrow and specific: decide which AI output carries real consequences if it is wrong, and make sure a person has verified that part before it leaves. It sits alongside the wider judgement of what a firm should never hand to AI in the first place, and it is cheaper than either of the apologies above.

Building that check into how a team actually works, so it holds under deadline pressure rather than living in a policy nobody reads, is the kind of practical problem our AI Lessons for Leaders sessions are built around. If it would help to work through where the last read belongs in your firm, book a discovery call.

The next step is small. Pick the one or two places this week where AI-drafted work reaches a client, and decide, out loud, who checks what before it goes. These were not fringe operators using AI casually. They were serious firms with reputations to protect, which is exactly why the failures matter. The lesson is to name the read that might otherwise be assumed, skipped or left outside the AI-assisted part of the work.

FAQ

What is an AI hallucination?

It is when an AI model produces something that is fluent, confident and false: a citation to a case that does not exist, a quote nobody said, a statistic with no source. The output looks exactly like correct work, which is what makes it dangerous in client-facing documents. It is a known property of how these models generate text, not a rare glitch, so it has to be designed around rather than hoped away.

Who is responsible when AI gets something wrong in client work?

The firm and the named person who signed off the work, exactly as before AI existed. A model cannot be accountable, so responsibility stays with whoever put their name to the output. This is the reason the verification step matters: the cost of a confident error lands on the firm, not on the tool that produced it.

How do you check AI work without losing the time it saved?

By being selective. Run a full verification only on the outputs where an error is expensive, usually anything that reaches a client, a court, a regulator or a candidate, and check the specific facts that can be checked: citations, figures, names, dates and quotes against a real source. Internal drafts and low-stakes work do not need the same treatment. The aim is a proportional check on the few things that matter, not a re-read of everything.

Does a firm have to tell clients it used AI?

There is no single blanket UK rule that every firm must disclose every use of generative AI in every piece of work. But that does not mean silence is always safe. Professional duties, client terms, confidentiality, data protection and the client's reasonable expectations may all require disclosure or prior agreement. The practical move is to agree a clear, consistent position on when and how the firm tells clients it uses AI, rather than deciding case by case, and to take advice where a specific duty in your field applies.

Sources and further reading

CNN, AI hallucinations in a Sullivan & Cromwell filing, April 2026. Source for the law firm's apology to a US bankruptcy judge over fabricated citations caught by opposing counsel.
Fortune, Deloitte's AI-error report for the Australian government, October 2025. Source for the fabricated references and court quote, the partial refund, and the disclosure that a generative AI system was used.
McKinsey, State of AI trust in 2026, March 2026. Survey of around 500 organisations; source for inaccuracy as the most cited AI risk (74%) and for mitigation lagging behind awareness.

Work with Good Transformer

Turn this thinking into working practice.

Explore team advisory