Anthropic launches the highly powerful Claude Fable 5 model

Published on 10 June, 2026

Quick Summary

Claude Fable 5 launched on June 9, 2026 as the first public release from Anthropic's Mythos-class model family, built by layering three safety classifiers on top of the base Mythos 5 model. It scores 80.3% on SWE-Bench Pro and crosses the 90% threshold on Hex's complex analytics benchmark, outperforming both Opus 4.8 and GPT-5.5. The standout capability is sustained multi-day agentic task execution without constant human oversight, demonstrated by Stripe completing a 50-million-line Ruby codebase migration in a single day. Notion is among the first integrations, targeting the use case of converting fragmented meeting notes into structured action plans. Users on Pro, Max, Team, and Enterprise plans have free access through June 22, 2026, after which billing shifts to consumption-based pricing.

Anthropic just dropped what may be its biggest release yet with Claude Fable 5, and it has quickly become the most talked-about model this week. Not just because of its raw power, but because of how Anthropic brought it to the world: this is the first time a Mythos-class model has been made available to general users, after two months under lock and key for safety reasons.

What is Fable 5 and why is it different from previous models?

At its core, Fable 5 is not a model built from scratch. It is a "safety-hardened" version of Mythos 5, the most powerful model Anthropic has ever built. Back in April 2026, Mythos Preview was only accessible to a very small group of organizations including AWS, Apple, Google, Cisco, and JPMorgan Chase through Project Glasswing, because its ability to detect and exploit software vulnerabilities was simply too powerful to release broadly. Anthropic had also launched Claude Opus 4.8 beforehand as a stepping stone in the development roadmap toward this new model generation.

To get Mythos out the door, Anthropic spent two more months building classifiers running in parallel. These are specialized AI systems that analyze requests before the main model processes them, and when a sensitive topic is detected, the system automatically routes to Claude Opus 4.8 at no additional charge. Anthropic says this mechanism only activates in fewer than 5% of sessions, meaning most general users will notice no difference compared to raw Mythos 5.

How does Fable 5 differ from Mythos 5 on safety?

Despite sharing the same underlying model, Fable 5 and Mythos 5 are two distinct products by design. The difference lies entirely in the safety classifiers layered on top of the base model.

Three classifiers Fable 5 has that Mythos 5 does not

Fable 5 is equipped with three safety classification layers running alongside the main model, covering: Cybersecurity, Biology and Chemistry, and Distillation. When a user submits a request in any of these areas, Fable 5 automatically falls back to Claude Opus 4.8 instead of the main model, and notifies the user accordingly.

Mythos 5 has none of these filters. It retains the full software exploitation and biological research capabilities that Anthropic considers too dangerous for wide distribution, which is why Mythos 5 remains restricted to a limited group within Project Glasswing, including vetted cybersecurity professionals, critical infrastructure organizations, and approved biology researchers.

How does this affect real-world performance?

The classifier difference leads to meaningfully different benchmark results in specialized tasks. On ExploitBench, a benchmark focused on cybersecurity, Mythos 5 scores 78% while Fable 5 lands near the 40% range of Opus 4.8, because the fallback mechanism triggers as soon as it detects attack-related requests. For scientific research, Mythos 5 can design proteins and generate novel hypotheses at roughly 10 times the speed of previous methods, while those same capabilities are restricted in Fable 5 for safety reasons.

Real-world performance: what do the numbers say?

On SWE-Bench Pro for coding tasks, Fable 5 scores 80.3%, compared to 69.2% for Opus 4.8 and 58.6% for GPT-5.5. But perhaps the more striking number comes from a real deployment: Stripe used Fable 5 to migrate an entire 50-million-line Ruby codebase in a single day, a task that would have taken a full engineering team more than two months to complete manually.

On business analytics, Fable 5 is the first model to cross the 90% threshold on Hex's complex analytics benchmark, outperforming Opus 4.8 by 10 percentage points. IMC, a quantitative trading firm, reported that the model scored near-perfect on their internal evaluation covering fact lookup, causal reasoning, and expected value calculations.

The biggest shift from previous models is the ability to sustain focus across multi-day tasks without needing human oversight at every step. Rather than executing commands one at a time, Fable 5 can take on a large project, self-plan, run tests, and handle errors in a loop, behaving far more like an engineer than a question-answering tool.

Notion integrates Fable 5: from scattered notes to a complete action plan

Notion is one of the first applications to integrate Fable 5, and the reason is straightforward. The tasks Fable 5 handles best, specifically reading multiple fragmented data sources, synthesizing them, and producing a logical structure, are exactly what Notion users need most in their daily work.

Simon Last, co-founder of Notion, described the primary use case as turning messy meeting notes into a task board with assignments and priorities. Instead of users having to re-read entire transcripts, summarize, and manually create tasks, Fable 5 handles the entire chain without needing to be prompted at each step.

Notion integrates Claude Fable 5 into its ecosystem

A few things to keep in mind before diving in

Fable 5 is powerful, but there are two things worth considering before building it into your workflow. First, the $50 per million output tokens price point is high relative to the current market, making it well-suited for complex engineering or analytical tasks but not necessarily for simpler jobs that Sonnet or Haiku can handle at a fraction of the cost. Second, the safety classifiers work well in the vast majority of cases but can trigger incorrectly in some legitimate research contexts, something Anthropic openly acknowledges and is continuing to refine.

For individual users on Pro or Max plans, the remaining days before June 22 are a reasonable window to evaluate whether Fable 5 actually generates enough value at that price point before committing to pay-per-use billing.

Discussion (0)

No comments yet. Be the first!

Claude Opus 5 Launches, Closing In on Fable 5

Anthropic has launched Claude Opus 5 at the same price as Opus 4.8 while raising response quality close to Fable 5, a model that costs twice as much. In other words, with near-Fable performance at half the price, most users will likely choose Opus 5 as their default and reserve Fable 5 for the small number of tasks that truly require the highest capability ceiling. What upgrades does Claude Opus 5 bring? According to Anthropic's launch announcement, Claude Opus 5 is the most capable Opus model to date and the first Opus release in the Claude 5 generation. Anthropic describes it as proactive and capable of deep reasoning, approaching the highest intelligence of Claude Fable 5 across many domains while using only half the token budget. The API model ID is claude-opus-5. Like Opus 4.8 and Fable 5, it has a default and maximum context window of one million tokens, a 128,000-token output limit, and thinking enabled by default. It has become the default model on Claude Max and the most powerful model available on Claude Pro. It is also offered through the Claude API, Amazon Bedrock, Google Cloud, Microsoft Foundry, and GitHub Copilot. Why will many users choose Opus 5 over Fable 5? The answer is not limited to price. Four factors make Opus 5 likely to become the default choice for daily work while Fable 5 moves into a specialized role for a small number of exceptional cases. It wins more real-world evaluations than it loses On Frontier-Bench v0.1, Anthropic's automated coding evaluation, Opus 5 scores 43.3% while Fable 5 reaches only 33.7%, a gap of almost ten points in favor of Opus 5. On CursorBench 3.2 at maximum effort, Opus 5 reaches about 70.1%, less than half a percentage point behind Fable 5 while costing only half as much. Across evaluations where both models have published results, Opus 5 wins more often than it loses, and its victories are generally larger than its defeats. The fastest way to verify this is to run the same task on both models at comparable effort levels and compare the output quality instead of relying only on published benchmarks. No mandatory 30-day data retention Fable 5 and Mythos 5 are Covered Models that require prompts and outputs to be retained for 30 days for safety purposes. They do not support zero data retention (ZDR) on any platform, even when an organization already has a ZDR agreement. Opus 5, by contrast, can still operate under ZDR like Opus 4.8. For teams handling legal, medical, or financial data, this difference alone may remove Fable 5 from consideration without any performance comparison. Fewer interruptions from safety filters Anthropic says the cybersecurity classifier intervenes about 85% less often with Opus 5 than with Fable 5. For coding agents that run for hours or overnight, a request being blocked midway because it touches a safety threshold is a real workflow risk, and Opus 5 significantly reduces that frequency. Adjustable effort makes budgets easier to predict Opus 5 supports adaptive thinking with effort ranging from low to maximum. Low or medium works for fast responses and high-volume workloads, while high or maximum suits complex coding, deep research, and multi-step workflows. Because teams pay according to the selected effort instead of being locked into a fixed Fable 5 cost level, they can optimize the budget for each task rather than paying the highest rate on every request. Initial impressions after trying Opus 5 After using Opus 5 for daily writing and coding work, the clearest impression is that it is substantially smarter than Opus 4.8, especially in understanding intent on the first request without repeated explanation. For tasks such as summarizing long documents, writing code with complex branching logic, or preparing a multi-step plan, Opus 5 works smoothly and loses the thread less often than the earlier version. There is still a gap compared with Fable 5, although it is smaller than expected. On work that demands deep reasoning or autonomous execution across many consecutive steps without intervention, Fable 5 remains slightly more dependable and makes fewer mistakes. For most daily work, however, that difference is difficult to notice without placing both models side by side. If you are using Opus 4.8, this is a sensible time to upgrade. If you are choosing between Opus 5 and Fable 5 for ordinary work, Opus 5 is almost certainly sufficient without paying the premium. When is Fable 5 still the right choice? Fable 5 retains an advantage on the hardest work. On SWE-bench Pro, which uses real GitHub issues and is considered one of the strictest measures of practical coding, Fable 5 scores about 80% while Opus 5 reaches roughly 79%, a small gap that still favors Fable. Fable 5 is also the only model Anthropic positions in the Mythos class, meaning its overall capability is designed to exceed Opus. This distinction is clearest in specialized fields such as expert medical analysis and autonomous research that continues for days without supervision. In other words, Opus 5 wins in daily coding and knowledge work, while Fable 5 retains its edge on the hardest problems and fields requiring the highest possible reliability. For most users and small teams, those problems represent a small portion of daily work, making the twofold price difference difficult to justify unless their workload falls directly into that category. Quick comparison: Opus 5 vs. Fable 5 CriterionClaude Opus 5Claude Fable 5 Input price$5/million tokens$10/million tokens Output price$25/million tokens$50/million tokens Context1 million tokens1 million tokens Maximum output128,000 tokens128,000 tokens Frontier-Bench v0.1 (coding agent)43.3%33.7% SWE-bench Pro (practical coding)~79%~80% Data retentionSupports zero data retentionMandatory 30-day retention, no ZDR Safety-filter interventionAbout 85% lowerHigher Best fitDaily work, coding agents, sensitive dataDifficult research, multi-day autonomous projects, specialized medical analysis Can Opus 5 really compete with GPT-5.6? On paper, the answer is yes, but not across every category. Opus 5 leads GPT-5.6 Sol in reasoning about novel situations, computer use, and most public coding evaluations, while GPT-5.6 Sol remains ahead on some command-line and information-retrieval tests. Neither wins outright, but for the first time a mid-priced Anthropic model stands level with, and in several areas ahead of, OpenAI's flagship model. The more useful question is not which model is stronger overall but which one fits your work. If daily tasks center on code, long documents, and multi-step execution, Opus 5 is a compelling choice on both price and quality. If you already rely on the OpenAI ecosystem or need a specific GPT-5.6 strength, the switching cost may not be worthwhile. The most reliable answer is still to run the same job on both models, because benchmark tables do not always reflect real experience.

Nam•

25 Jul, 2026

GPT-5.6 vs Claude Fable 5: What Is New?

Sol, Terra, and Luna make GPT-5.6 look more like a product family than a single model. The naming also signals what OpenAI is trying to change: users no longer have to choose only between an expensive flagship and a much smaller model. Instead, they get three capability tiers designed for different workloads. The important caveat is that GPT-5.6 is currently in limited preview, and OpenAI says it is not available in ChatGPT during this preview period.On the other side, Anthropic positions Claude Fable 5 as a frontier model for reasoning, software engineering, scientific research, and long horizon agentic work. The useful question is therefore not simply which model is smarter. It is which product architecture helps a team complete real work with predictable quality, latency, and cost.What GPT-5.6 actually isAccording to OpenAI's preview announcement, GPT-5.6 consists of Sol, Terra, and Luna. Sol is the flagship and most capable option, Terra is a strong lower cost model, and Luna is the fastest and most cost efficient member of the family.The important change is how OpenAI divides demand into three tiers. A research team might use Sol for a difficult reasoning problem, a product team might run most daily work on Terra, and a high volume system might use Luna for thousands of short requests. This looks more like an infrastructure strategy than the launch of a single new chatbot.Availability matters: OpenAI says GPT-5.6 is not available in ChatGPT during the preview. An experience in an API, developer tool, or partner platform should not be treated as the final ChatGPT experience.Sol is designed for difficult, extended workSol is positioned as the strongest GPT-5.6 model for deep reasoning, complex coding, and long multi step tasks. A software team might ask it to understand a repository, identify the cause of a bug, propose a minimal patch, and write regression tests. Sol's value is not answering a short question quickly. It is maintaining the objective while working through a longer chain of decisions.OpenAI also highlights stronger cyber capability as reasoning increases. That can be useful for authorized security testing and vulnerability analysis, but it also makes access controls, logging, sandboxing, and human approval more important.Terra aims for the practical middleTerra targets the broadest category of work: document analysis, content production, application development, research synthesis, and operational support. If Sol is the specialist called for the hardest problem, Terra is the strong team member expected to work throughout the day without making every request unnecessarily expensive.A marketing team could use Terra to read market reports, extract insights, build an outline, and draft several content variants. A development team could use it for code review, test generation, and tickets with a clear scope. This tier could become the default if its real world quality remains consistent.Luna prioritizes speed and scaleLuna is designed for low latency and lower cost. Classification, conversation summaries, field extraction, drafting, and ticket routing do not always require the strongest model. In these cases, response time and total operating cost matter more than maximum reasoning capability.Fast does not mean suitable for everything. If a task requires source verification, a long plan, or a code change with a large blast radius, a team should move it to Terra or Sol instead of forcing Luna beyond its intended role.Claude Fable 5 takes a different routeAnthropic presents Claude Fable 5 as a frontier model for reasoning, software engineering, vision, scientific research, and long horizon agentic work. Instead of emphasizing three product tiers in one generation, Anthropic's message focuses on the capability of a powerful model working inside the Claude ecosystem.This difference changes deployment decisions. With GPT-5.6, an engineering team might build a router that sends each request to Sol, Terra, or Luna. With Fable 5, the focus may be on optimizing prompts, tools, context, and reasoning budgets around one primary model. Neither approach is universally better because the answer depends on workload and operational maturity.A fair comparison: Do not run one prompt and declare a winner. Build a test set covering short tasks, long reasoning, coding, extraction, and recovery from errors. Measure accuracy, latency, the number of human corrections, and the total cost of a completed task.Coding and agentic work depend on the surrounding toolsBoth GPT-5.6 Sol and Claude Fable 5 target complex software work, but the practical experience depends heavily on the system around the model. The ability to read a repository, execute commands, observe results, and correct mistakes can matter as much as a benchmark score. For OpenAI workflows, the Codex page is a useful starting point for understanding how a model participates in coding work.Fable 5 may be attractive to teams already invested in Claude and long running agentic workflows. Read our Claude Fable 5 coverage for more context on Anthropic's positioning and the types of work it targets.What early forum experience tells usEarly discussions on Reddit and developer communities focus on how different Sol, Terra, and Luna feel in real work. Some users describe Sol as the better fit for multi step tasks, Terra as the practical option for routine work, and Luna as the interesting choice for speed. These observations match OpenAI's positioning, but they do not establish a precise quality gap.Forum reports are useful because they reveal the questions real users care about. However, they are self selected evidence. People may use different prompts, access levels, integrations, and preview versions. A result from a developer platform does not guarantee the same result when a model eventually appears in ChatGPT.Early positivesThe three tiers make it easier to understand which model belongs to which workload.Luna creates a clear expectation of low latency for high volume systems.Terra could become a default if it delivers stable quality at a practical cost.Sol is expected to be stronger for coding, long reasoning, and tasks with several verification steps.Open questionsHow large the practical quality gap between Sol and Terra will be on common workloads.The total cost after retries, corrections, and human review are included.How Luna behaves with long prompts and many constraints.Whether performance remains stable as GPT-5.6 expands beyond preview access.Forum reports are not benchmarks: Community experience should help you choose test cases, not make a production purchasing decision by itself.Comparing GPT-5.6 and Fable 5 by workloadWriting and document analysisTerra appears positioned for most document work because it balances capability and cost. Fable 5 may be attractive when documents are long, questions are complex, and the model must maintain an argument across a large context. A useful evaluation should score citation accuracy, structural consistency, and how much editing is required before publication.Software development and debuggingSol and Fable 5 are both candidates for difficult coding tasks. A representative test should include reading existing code, identifying the root cause, producing a minimal fix, writing tests, and explaining risk. Asking a model to create an isolated function from scratch does not reflect how well it works in a real repository.High volume processingLuna has the clearest positioning advantage when speed and cost dominate. At thousands of extraction or classification requests per day, a small difference in price and latency can have a large effect. Fable 5 may be unnecessarily expensive for a workload that only needs short, structured outputs.Research and long reasoningSol and Fable 5 should be compared with tasks that have verifiable outcomes rather than open questions that merely sound impressive. Give both models the same research material and ask them to identify assumptions, detect contradictions, propose an experiment, and explain what evidence is missing. The better model is the one that helps users discover errors faster, not the one that writes the longest answer.Should you choose Sol, Terra, Luna, or Fable 5?If you want maximum capability inside the OpenAI ecosystem, Sol is the first model to test. If you need a strong model for regular use, Terra has the more practical position. If your workload contains many short and repetitive tasks, Luna could reduce operating cost. Fable 5 remains relevant for teams invested in Claude or focused on long reasoning and agentic work.Because GPT-5.6 is still in preview, replacing an entire production workload would be premature. Run the models in parallel on real but sanitized data, record failures, and use the same criteria for every candidate.A test plan you can use nowSelect 20 tasks that represent real work, including easy and difficult cases.Run each task on Sol, Terra, Luna, and Fable 5 when access allows.Score accuracy, response time, total cost, and required human correction.Track severe failures separately instead of relying only on averages.Choose a model for each workload category rather than forcing one model to do everything.Is GPT-5.6 worth switching to now?The most important change in GPT-5.6 may not be Sol's raw capability. It is OpenAI's decision to turn one model generation into three operational tiers. That could help organizations control cost, but only if they can classify workloads and route requests intelligently.The practical next step is to build a small benchmark from your own data. If Sol wins difficult tasks, Terra is good enough for routine work, and Luna handles high volume requests reliably, the three tier architecture has real value. If Fable 5 remains more consistent on long reasoning, a multi model strategy may still be better than committing to one provider.

Liên•

9 Jul, 2026

Comparing Hermes Agent, OpenClaw, and Claude Cowork

Hermes Agent, OpenClaw, and Claude Cowork are all called AI agents because they do more than answer questions. They can break an objective into multiple steps, call tools, read data, and produce a complete result. However, comparing these three products using only a feature table can easily lead to the wrong choice. Hermes Agent is designed as an agent that can learn how you work. OpenClaw is designed as a personal assistant that is always available through messaging channels, while Claude Cowork is intended for users who want to delegate office work in natural language within an environment managed by Anthropic. Therefore, the important question is not which tool is the most powerful, but how much you want to manage yourself and where you want the agent to appear in your daily workflow. Three products with different designs The differences among these three AI agent tools do not lie only in the model that performs the work. They also come from the framework surrounding the model, which manages tools, memory, access permissions, and the execution loop. This concept is explained in detail in our article What is an agent harness?, which helps explain why three products that are all called AI agents can behave so differently. Hermes Agent prioritizes a learning loop and execution environments The notable point about Hermes is that skills are not merely a list of skills that have already been installed. After completing a task, the agent can extract a useful process, save it, and improve it the next time. Our article What is Hermes Agent? explains this self learning mechanism separately. The accumulated value of this mechanism grows over time when users have recurring tasks such as analyzing projects, monitoring information sources, standardizing reports, or operating a chain of internal tools. Hermes also supports several types of sandboxes, including local execution, Docker, SSH, Singularity, and Modal. A sandbox is an isolated environment in which the agent executes commands and works with files. This flexibility lets users choose among speed, control, and isolation, but it also requires an understanding of infrastructure, access permissions, and secret management. OpenClaw uses the Gateway as its coordination center In OpenClaw, the Gateway is the control layer between the agent, devices, and communication channels. A message can become a request for the agent to read a calendar, process a file, call a service, or respond in the correct conversation. This approach feels natural for people who want to message an assistant from their phone without needing to remember where the server is running. OpenClaw is most suitable when the agent needs to react as soon as work appears, without requiring the user to open a computer or enter a separate application. Instead of waiting for you to start a work session, it remains available in the messaging channels you already use and begins processing as soon as a message arrives or a configured event is triggered. Claude Cowork provides a managed workspace Cowork reduces the amount of infrastructure that users must manage themselves. In the desktop application, users can grant access to a local folder and ask Claude to read, organize, or create files. With remote sessions, work takes place in an isolated environment on Anthropic servers, which suits long tasks that do not require a personal computer to remain active continuously. In return, the level of customization and control over the execution layer is not as broad as in a self hosted project. Cowork is better suited to people who want quick results within the Claude ecosystem and do not want to maintain a server or design a Gateway themselves. How the memory of the three tools works differently Memory in an agent should not be understood simply as storing every conversation. A useful system must know which information is worth retaining, which information matters only in the current session, and when old data should be retrieved. If it stores too little, the agent must ask the same questions repeatedly. If it stores too much, costs will certainly increase and sensitive data can easily be used in the wrong context. Hermes stands out by combining persistent memory with skills that can improve. Memory records preferences and context, while a skill records how to complete a type of task. These two layers make the agent feel as if it increasingly understands the user, but quality still depends on whether the user reviews what has been stored and removes processes that are no longer appropriate. OpenClaw runs across several channels at once, and that is also its most complicated aspect. Remembering conversation content is only one part of the problem. The harder issue is distinguishing who is speaking, which channel they are using, and which scope the work belongs to. A command sent in a company Slack group should not automatically pull in private context previously discussed on Telegram. If session configuration and identity policies should be established clearly from the beginning, even a strong model cannot rescue a system when everything remains ambiguous. Cowork limits context to each work session, reads only the files for which you grant access, and uses only the connections you allow. For people who are not accustomed to building systems, this approach is easier to control because the boundaries of each task are relatively clear. However, clear boundaries do not mean automatic understanding. You still need to explain what you want, what completion should look like, and where the data should come from. Cowork cannot infer your company context unless you actively provide it. Which type of work each tool automates best Hermes includes web tools, terminal access, MCP, scheduled runs, and subagents. MCP is a connection standard that helps an agent communicate with external data sources or applications through a consistent interface. By combining MCP with skills, users can turn an experiment into a repeatable process, such as collecting data each morning, analyzing changes, and sending a summary. OpenClaw is strong at workflows that begin with a message or an event. For example, a user can send an invoice to a private channel, after which the agent extracts the information and updates a storage system. Another example is receiving a service alert, gathering additional diagnostic data, and returning a summary directly to the operations group. Its value comes from reducing the gap between the moment a need appears and the moment the agent begins acting. Cowork suits structured office outputs. It can research a topic, synthesize data, create a document, and continue revising it according to feedback. Long running or scheduled tasks help Cowork move beyond short question and answer interactions. Even so, organizations need to inspect each connector and its access permissions before allowing the agent to work with real data stores. When deep integration with private infrastructure is required, Hermes and OpenClaw generally provide more room. When the priority is reducing the time from a request to a finished document, Cowork usually has an advantage. This is the difference between a platform intended for assembly and a product that has already been packaged. How secure are these three AI agents? There is no simple answer to the question of which one is safer because the security risks of each tool come from completely different areas. Hermes Agent: Self hosting does not automatically mean safety. The greatest risk comes from automatically generated skills because, in essence, they are pieces of code that the agent writes and then runs by itself. If they are not reviewed before scheduled execution, a skill with terminal access or permission to send data externally can do things without your knowledge. In addition, API keys and sensitive folders should not appear in prompts or be mounted directly into a sandbox when the skill does not actually need them. OpenClaw: The more channels you connect, the wider the attack surface becomes. The point most easily overlooked is sender authentication. If the Gateway trusts only a display name or a channel that has not been properly secured, a compromised messaging account may be enough for someone to issue commands to your agent. The list of people allowed to send commands and the permissions of each bot need to be reviewed whenever you add a new channel. Claude Cowork: The most concerning risk is prompt injection, which occurs when the agent reads a document or webpage containing hidden instructions intended to redirect it away from your original request. Anthropic provides safeguards and asks for confirmation before sensitive actions, but those measures do not replace your own review of the results or the need to avoid granting broader permissions than the task actually requires. Note: With any agent, do not grant permission to delete files, send external messages, or perform sensitive transactions. Start with read only mode, enable complete logging, and retain human approval for actions that require human judgment. Should you choose Hermes Agent, OpenClaw, or Claude Cowork? Every tool has its own strengths and weaknesses, so selecting the most suitable one depends on the user and the work that needs to be done. Choose Hermes Agent when you want the agent to understand how you work increasingly well Hermes suits developers, researchers, and technical teams that want an agent to learn their own processes and run on flexible infrastructure. It is particularly worth considering when tasks recur often enough for skills to create accumulated value. You need to be prepared to read logs, review skills, and manage execution environments. Best suited when: You want the agent to remember and improve work processes through repeated use. You can manage sandboxes, select models, and control access permissions yourself. Choose OpenClaw when work requires continuous communication through messages OpenClaw is suitable when the assistant needs to be present on Telegram, WhatsApp, Slack, Zalo, or similar channels. It is useful for alerts, rapid collection of requests, and automation that begins with a conversation. In return, you must manage identity, channel permissions, and Gateway stability. Best suited when: Requests usually arrive as messages or automated alerts. You need one coordination point for several different communication channels. Choose Claude Cowork when you need quick results without building a system Cowork suits content creators, analysts, and managers who need complete documents, spreadsheets, and slides without wanting to think about servers or Gateways. In return, you should understand the limits of your plan, where data travels, and which connections are enabled before introducing real work. Best suited when: You want to describe the required outcome in natural language and receive a complete output. You prioritize the convenience of a managed service over full control of the infrastructure.

Nam•

14 Jul, 2026

Claude Code self-orchestrates work with Dynamic Workflows

Thariq Shihipar's post from the Claude Code team at Anthropic has drawn significant attention in the AI user community. He revealed Dynamic Workflows, a feature that allows Claude to design its own workflows instead of just waiting for commands, and this is considered the most important upgrade since Claude Code gained skills and subagents. This feature uses the harness concept as its foundation to handle technical requirements. Three fatal errors that cause AI agents to fail at complex tasks Before discussing the solution, Thariq points out an uncomfortable reality: most AI agents today face serious problems when handling complex, multi-step tasks within a single context window. He categorizes them into three core failure modes that nearly every agent system encounters. Agentic laziness: when AI declares done after finishing only half the work This is the phenomenon of Agentic Laziness, where an agent completes part of the work and then self-reports as finished. A specific example: you ask an agent to review 50 code files, but it only looks through 20 files and concludes that everything is fine. The cause lies in context window limitations, and when the amount of information is too large, the agent tends to take shortcuts to finish faster. Will an agent be biased toward itself? An agent being biased toward itself is called Self-Preferential Bias, and this occurs when you ask an agent to review its own results. Like asking a student to grade their own exam, the agent tends to favor the results it already produced, leading to uncritical validation and overlooking potential errors. This is particularly dangerous in tasks requiring high accuracy. How to prevent an agent from losing its original intent step by step Goal Drift is the phenomenon where an agent gradually forgets its original goal after many processing steps or after context compaction. Specific constraints like "don't do X" or important edge cases can be dropped when memory is summarized, so the final result deviates from the original requirement without the agent ever realizing it. Dynamic Workflows helps Claude write its own work orchestration framework Anthropic's solution is not to make the model smarter, but to change how Claude organizes work. Dynamic Workflows transforms Claude from a code-writing agent into an agent that designs operational workflows for complex tasks. The core concept here is self-organization: Claude can analyze goals on its own, choose the appropriate working mode, and create an internal workflow before starting execution. Custom harness instead of a fixed workflow Instead of operating within a fixed environment, Claude writes a harness framework in JavaScript designed specifically for each task. This harness acts like a project manager: it breaks down the work, initializes specialized sub-agents for each part, assigns appropriate tools, routes work to different models, and performs adversarial verification to ensure quality. How does a harness work? To understand more clearly, imagine the harness as a theatrical script that Claude writes for itself before performing. When given a complex task, Claude does not dive straight in but pauses to write a JavaScript snippet describing the entire workflow: how many sub-agents are needed, what each agent does, what order things happen in, and how results from one agent are passed to the next. A concrete example: if you ask Claude to audit 1,000 Slack messages to find recurring incidents, the harness might look like this logically: Agent 1 (classification): reads all messages and assigns labels by topic Agent 2, 3, 4 (parallel processing): each agent deeply analyzes one topic group Agent 5 (synthesis): collects results from the three agents above and removes duplicates Agent 6 (cross-check): re-reads the synthesized results and provides independent critique The important point is that Claude writes this harness based on the specific characteristics of each task, not according to a rigid template. Different tasks produce different harnesses, and that is exactly why this feature is called "dynamic." The harness is written in JavaScript and runs within the Claude Code environment. You can activate Dynamic Workflows by saying "use a workflow," however this phrase is easily confused with regular workflows, so it is recommended to use the keyword "ultracode" in your prompt to clearly distinguish between a regular workflow and a Dynamic Workflow and save more tokens. Context isolation to prevent context degradation One of the smartest design choices in Dynamic Workflows is the Isolation feature. Each sub-agent is given its own separate context window, completely independent from other agents. This prevents the phenomenon of context rot, meaning the quality degradation that occurs when a context window becomes overloaded, while also eliminating both Agentic Laziness and Goal Drift since each agent focuses only on its assigned piece of work. Six reusable orchestration patterns Claude can combine six available orchestration patterns to handle a wide variety of situations: Classify and act: classifies input then selects the appropriate action Fan out and synthesize: splits work into multiple parallel branches then synthesizes the results Cross-check verification: uses a separate agent to cross-check results Generate and filter: generates multiple options then filters for the best one Tournament: puts options into direct head-to-head elimination rounds Loop until done: repeats until a quality threshold is reached Can you optimize costs when using Dynamic Workflows? Running multiple sub-agents in parallel might sound expensive, but Dynamic Workflows is actually designed to optimize costs in several specific ways. Smart routing to the right model Not every step in a workflow needs the most powerful model. The harness allows Claude to route each task to a model that matches its complexity: simple classification steps can run on smaller, cheaper models, while only steps requiring deep reasoning need a large model. The result is that total costs are often lower than running the entire workflow on a single model. Context isolation helps reduce token consumption Because each sub-agent only receives the portion of context it actually needs for its work, total token consumption across the entire workflow is often significantly lower compared to the traditional approach, where the full conversation history gets stuffed into a single context window that keeps growing larger. Avoiding rework through early checkpoints The harness can install quality checkpoints between steps. If a step produces a result that does not meet requirements, the system stops and reprocesses just that step rather than running the entire workflow to completion before discovering an error at the end. This approach saves significant costs for long multi-step tasks. If you are concerned about costs, start with moderate-volume tasks to observe actual token consumption before scaling up. What are the real-world applications of Dynamic Workflows? What excites Thariq most is not the coding capability, but the way Dynamic Workflows extends Claude Code into non-technical tasks. The feature can be activated with natural language (for example: "use a workflow") or the keyword "ultracode." Real-world applications include: Auditing thousands of Slack messages to find recurring incidents Systematically ranking and screening large candidate pools Running automated live elimination tournaments to choose the best name for a CLI tool Handling high-precision operational tasks that previously only humans could perform The design philosophy is architectural constraints rather than raw intelligence The most notable aspect of Anthropic's approach is the design philosophy: rather than trying to increase the raw intelligence of the model, they build architectural constraints into the workflow. In other words, instead of hoping the model will naturally know how to avoid mistakes, they design the system so that errors are hard to occur in the first place, and the harness is the tool that enforces that philosophy. Dynamic Workflows shows that the next step forward for AI agents does not lie in smarter models but in the ability to design workflows on their own. Just as a good manager divides work among a team rather than doing everything alone, Claude can now organize its own team of sub-agents, and this is a clear signal that the future of AI coding is no longer just about writing code faster but about organizing work better.

Nam•

5 Jun, 2026

Quick Summary

What is Fable 5 and why is it different from previous models?

How does Fable 5 differ from Mythos 5 on safety?

Three classifiers Fable 5 has that Mythos 5 does not

How does this affect real-world performance?

Real-world performance: what do the numbers say?

Notion integrates Fable 5: from scattered notes to a complete action plan

A few things to keep in mind before diving in

Discussion (0)

Related Articles

Claude Opus 5 Launches, Closing In on Fable 5

GPT-5.6 vs Claude Fable 5: What Is New?

Comparing Hermes Agent, OpenClaw, and Claude Cowork

Claude Code self-orchestrates work with Dynamic Workflows