AI adoptionLeadershipAI tools

Fable 5 is free on paid Claude plans until 7 July. Test it properly

Until 7 July, Fable 5 is included in paid Claude plans at no extra cost. We set out a half-day trial on real client work that ends in a written decision.

Good Transformer3 July 20268 min read

Until Tuesday 7 July, the most capable AI model available to the public is included at no extra cost in paid Claude plans. If your firm pays for Claude Pro, Max, Team or premium Enterprise seats, Fable 5 is sitting in the model picker right now, free, for four more days. We think that is worth half a day of a leader's attention, and not for play. Used deliberately, the window answers a question no product demo can: does frontier-grade AI change what your firm can do, or is the tier you already pay for enough?

An idle poke at a chatbot will not answer it. A structured trial will. Here is the one we would run.

Why this window is worth four days

Fable 5 has had the strangest arrival of any AI model to date. It launched on 9 June, was switched off worldwide on 12 June under a US export-control directive, and returned on 1 July with new safeguards once the restriction was lifted. Anthropic's account of the redeployment includes the detail that matters for this piece: until 7 July the model is included in paid Claude plans at no extra cost, drawing on existing usage limits, after which it moves to metered usage credits.

Why bother, when your firm already gets useful work from the tier it pays for? Because the honest evidence about the top end comes with a number attached, and the number rewards attention. The Remote Labor Index, run by the Center for AI Safety with Scale AI, tests AI models on 240 real freelance projects that clients actually commissioned and paid for, across 23 fields, and it scores a model only when the finished deliverable would be accepted by the paying client. Fable 5 now leads all public models at 16.1 per cent, roughly double the previous best score.

Read that number in both directions, because both readings are true. The best AI in the world completes about one real project in six to a standard a client would accept, so anyone selling you wholesale replacement of professional work is ahead of the evidence. At the same time, the score has roughly doubled since the last round of results, so anyone telling you it is all hype is behind the evidence. The window is a cheap chance to find out where those two truths meet in your own firm's work, rather than in someone else's benchmark.

Pick three pieces of real work

A trial is only as good as the work you feed it, and demo tasks teach you nothing. Choose three real pieces of work from your own firm.

A routine deliverable your firm produces every week. The management-accounts commentary, a candidate shortlist summary for a client, a first-pass mark-up of a standard contract. Routine work has a known standard, and you can put the output next to what your team actually produced last time and compare line by line.
A judgement-heavy deliverable you would normally give to a senior person. An advisory letter on an awkward client question, the strategy section of a pitch, a due-diligence summary that has to end in a recommendation. This is where the capability difference between model tiers is claimed to live, so it is where the claim gets tested.
One task you privately believe AI cannot do. This is the most useful of the three. If the model fails it in the way you expected, you have calibrated the ceiling with your own eyes. If it does not fail, you have learned something worth far more than the half-day it cost you.

The usual confidentiality rules apply: strip client identities from anything sensitive before it goes in, exactly as your existing AI policy already requires for any tool handling client material.

Brief it like an outside contractor

Most disappointing AI output is downstream of a lazy brief. If you would not hand a new contractor a single sentence and expect a usable deliverable back, do not hand one to a frontier model either. Give it what you would give a capable outsider on their first day: the background, the source documents, the house style, the standard you expect, and a plain description of what finished looks like. Then let it work. These models will plan, use the material, and run long tasks on their own; the brief is what points all of that at your standard rather than at a generic one.

Have your most experienced people run the trial, for the same reason you would not send a trainee to assess a lateral hire. Expertise is what tells you, quickly and with confidence, whether an output is genuinely good or merely fluent.

Judge it as a paying client would

Steal the Remote Labor Index's scoring rule for your own desk: an output passes only if you would send it to a paying client with your firm's name on it. A rewrite that "just needs a polish" is a fail. A confident paragraph resting on an invented figure is a fail. Hold the line on this, because a generous marker learns nothing.

Then write down where each task fell short, because the failures are the yield of the whole exercise. A model that produces a fluent management-accounts commentary but invents one variance explanation has shown you exactly where the boundary sits: drafting can be delegated, unchecked analysis cannot. A contract mark-up that catches nine issues in ten has shown you it is a first-pass tool behind a mandatory senior review, and that is still worth real money. The pattern of failure, on your own work, is the most commercially useful document this window can produce.

Count your own time honestly too. If checking and correcting an output took longer than producing the work the old way, that task fails the trial no matter how good the prose looked. We published a way of putting numbers on exactly this question in our piece on whether your AI is paying for itself, and the same arithmetic applies at this tier.

Decide before the meter starts

On 7 July the free window closes and further use is metered. Close the trial with a decision in writing, however short. Three outcomes cover most firms.

The upgrade case is proven. The frontier tier did something your current tier demonstrably cannot, on work that matters commercially. Budget for it deliberately, for the specific people and tasks where the difference showed, and nowhere else.
Your current tier is enough. The gap was small on your actual work. That is a happy result: you now hold dated evidence with which to resist upgrade pressure, and a record of what the top end could and could not do in your hands.
The capability is there but your firm is not ready to use it. That is common, and useful to know. The bottleneck is briefing, checking and process, and none of those is solved by a bigger model.

One caution belongs in the file next to the decision. The model you have been testing was unavailable worldwide for almost three weeks because a government directive required it. Whatever the trial shows, do not let any single vendor's frontier model become something your firm cannot deliver without; the full argument, and the fallback plan, are in our note on vendor continuity.

Running this kind of structured learning on a leader's own work, with someone experienced to think against, is precisely what our 1-to-1 AI lessons for leaders are built around.

FAQ

Is Fable 5 actually better than the model a firm already uses?

On the benchmarks it clearly is; the Remote Labor Index score is roughly double the previous best. On your work, that is exactly what the trial exists to establish. The difference tends to show most on long, judgement-heavy tasks with a lot of source material, and least on short everyday drafting, which the standard tiers already handle well.

What does it cost after 7 July?

From 8 July, Fable 5 use on paid Claude plans is metered through usage credits rather than included in the subscription. The practical point is simple: after Tuesday every experiment has a price, which is why the free days deserve deliberate use.

Should the whole team get access during the window?

We would keep the trial small and senior. The free access draws on your existing usage limits, and the goal is a firm-level judgement, which needs experienced eyes more than it needs volume. Share the written findings with the team afterwards; wider access can follow the decision rather than precede it.

What if the window closes before the trial is finished?

A half-day is enough for three tasks if the work is chosen in advance. If you do run out of road, the same trial still works at the metered price, and the cost of a few experiments is small against the cost of a wrong platform decision. The window makes the trial free this week; it does not make it impossible next week.

The next step takes ten minutes: choose the three pieces of work today, book the half-day before Tuesday, and decide which senior person runs it. If you would like help turning what the trial teaches into a working plan for your firm, book a discovery call and we will go through it with you.

Work with Good Transformer

Turn this thinking into working practice.

Explore team advisory