Claude Opus 4.8 launches: what is new in Anthropic's strongest model?

Published on 29 May, 2026

Quick Summary

Anthropic introduces Claude Opus 4.8 with a 1 million token context window, fast mode preview, mid-conversation system messages, and upgrades for long-running coding agents.

Anthropic has introduced Claude Opus 4.8, a release the company describes as its strongest generally available model. The update is not only about stronger reasoning for complex work; it also adds practical changes for developers building AI agent, coding assistants, and long-running automation workflows.

The important point is that Claude Opus 4.8 is not just a renamed Opus 4.7. Anthropic is focusing on three practical areas: more stable long-context handling, more reliable tool use, and better cost control in agent loops. With the model ID claude-opus-4-8, it is already available for Claude API and supported cloud platforms.

What is Claude Opus 4.8?

Claude Opus 4.8 is targets multi-step reasoning, long-running agentic coding, and work that requires a higher level of autonomy. According to Anthropic's documentation, the model supports a default 1 million token context window on Claude API, Amazon Bedrock, and Google Vertex AI, while Microsoft Foundry supports 200,000 tokens.

The model also supports up to 128,000 output tokens, adaptive thinking, and the same core tool capabilities as Claude Opus 4.7. This means teams already using Opus 4.7 can likely test the upgrade with limited changes, but they should still review behavior shifts and API constraints before moving production traffic.

Key new features

Claude Opus 4.8 introduces several updates that directly affect prompt design, long conversation management, and API cost optimization. These are especially relevant if you run deep chatbots, coding assistants, or multi-step agents.

System messages during a conversation

One major change is support for adding a message with role: "system" after a user turn in the messages array, as long as Anthropic's placement rules are followed. This lets developers update instructions during a long conversation without resending the entire original system prompt.

In practice, this is useful for agents that run through many steps. Instead of breaking prompt cache efficiency by repeating a large instruction block, an application can add new instructions at the right moment, preserve cache for prior conversation context, and reduce input cost across long workflows.

Fast mode for Claude API

Anthropic is also bringing fast mode to Claude Opus 4.8 as a research preview on Claude API. By setting speed: "fast", users can receive higher output token throughput, with Anthropic describing speedups of up to 2.5 times under supported conditions.

Fast mode is especially useful for products that need lower latency while staying on the same powerful Opus model. However, the documentation also notes that this mode carries premium pricing, so engineering teams should reserve it for high-value paths or workflows where response speed clearly matters.

Prompt caching becomes easier to use

With Claude Opus 4.8, the minimum prompt size for caching drops to 1,024 tokens. This small change has a practical impact: many prompts that were previously too short to create a cache entry on Opus 4.7 can now be cached without code changes.

For products with stable system prompts, long internal documentation, or repeated API calls, prompt caching can reduce cost significantly. Combined with mid-conversation system messages, Claude Opus 4.8 is better suited for agents that need to preserve state across many steps.

Documented refusal stop details

Anthropic has also documented the stop_details object for refusal responses. When the model cannot complete a request, the application can receive not only a refusal stop reason but also more structured information about why the refusal happened.

This helps products handle the user experience more gracefully. Instead of showing a generic error, an application can distinguish different refusal categories and guide users toward a more appropriate next step.

API constraints to watch

Although Anthropic says these constraints carry over from Claude Opus 4.7 and are not breaking changes for code that already works with the previous model, developers should still check them carefully. On the Messages API, Claude Opus 4.8 does not support non-default values for temperature, top_p, or top_k. Passing these sampling parameters will return a 400 error.

Another point is that adaptive thinking is the only supported thinking mode. Older configuration patterns that set a fixed thinking token budget are no longer the right approach for Opus 4.8. Anthropic recommends using thinking: {"type": "adaptive"} and controlling reasoning depth through the effort parameter.

On Claude Opus 4.8, the default effort is high across all surfaces, including Claude API and Claude Code. If an application already sets effort explicitly, the current configuration remains in place; if not, the default behavior may differ from prior expectations and should be tested.

Why it matters for coding agents and long workflows

Anthropic says Claude Opus 4.8 targets improvements in long-running coding agents, including better long-context handling, less frequent compaction, and stronger recovery after compaction. These are hard problems for large models: after many rounds of reading files, editing code, running tests, and summarizing state, agents can lose focus or miss important details.

The new model is also optimized to trigger tools at the right time more reliably. For systems that need to call search, databases, terminals, browsers, or internal APIs, fewer missed tool calls can make a large difference in reliability. This matters more than a single benchmark score because real agent quality depends heavily on knowing when to use the right tool.

Should you upgrade to Claude Opus 4.8?

If you already use Claude Opus 4.7 for complex reasoning, programming, or autonomous agents, Opus 4.8 is worth testing early. Changes such as the 1 million token context window, lower prompt caching threshold, and mid-conversation system messages all target real production problems, not only short prompt quality.

Still, engineering teams should not upgrade blindly. Review sampling parameters, thinking configuration, default effort expectations, and cost implications if you plan to use fast mode. For products handling sensitive data or critical workflows, run an A/B test on representative tasks before moving all traffic to Claude Opus 4.8.

Conclusion

Claude Opus 4.8 shows that Anthropic is putting more weight behind the agent and developer market. The improvements are not only about reasoning quality; they also cover operational details such as caching, mid-conversation system messages, output speed, and refusal classification. For teams building serious AI products, this is a release worth watching because it addresses real deployment issues in long-term AI applications.

Discussion (0)

No comments yet. Be the first!

Comparing Hermes Agent, OpenClaw, and Claude Cowork

Hermes Agent, OpenClaw, and Claude Cowork are all called AI agents because they do more than answer questions. They can break an objective into multiple steps, call tools, read data, and produce a complete result. However, comparing these three products using only a feature table can easily lead to the wrong choice. Hermes Agent is designed as an agent that can learn how you work. OpenClaw is designed as a personal assistant that is always available through messaging channels, while Claude Cowork is intended for users who want to delegate office work in natural language within an environment managed by Anthropic. Therefore, the important question is not which tool is the most powerful, but how much you want to manage yourself and where you want the agent to appear in your daily workflow. Three products with different designs The differences among these three AI agent tools do not lie only in the model that performs the work. They also come from the framework surrounding the model, which manages tools, memory, access permissions, and the execution loop. This concept is explained in detail in our article What is an agent harness?, which helps explain why three products that are all called AI agents can behave so differently. Hermes Agent prioritizes a learning loop and execution environments The notable point about Hermes is that skills are not merely a list of skills that have already been installed. After completing a task, the agent can extract a useful process, save it, and improve it the next time. Our article What is Hermes Agent? explains this self learning mechanism separately. The accumulated value of this mechanism grows over time when users have recurring tasks such as analyzing projects, monitoring information sources, standardizing reports, or operating a chain of internal tools. Hermes also supports several types of sandboxes, including local execution, Docker, SSH, Singularity, and Modal. A sandbox is an isolated environment in which the agent executes commands and works with files. This flexibility lets users choose among speed, control, and isolation, but it also requires an understanding of infrastructure, access permissions, and secret management. OpenClaw uses the Gateway as its coordination center In OpenClaw, the Gateway is the control layer between the agent, devices, and communication channels. A message can become a request for the agent to read a calendar, process a file, call a service, or respond in the correct conversation. This approach feels natural for people who want to message an assistant from their phone without needing to remember where the server is running. OpenClaw is most suitable when the agent needs to react as soon as work appears, without requiring the user to open a computer or enter a separate application. Instead of waiting for you to start a work session, it remains available in the messaging channels you already use and begins processing as soon as a message arrives or a configured event is triggered. Claude Cowork provides a managed workspace Cowork reduces the amount of infrastructure that users must manage themselves. In the desktop application, users can grant access to a local folder and ask Claude to read, organize, or create files. With remote sessions, work takes place in an isolated environment on Anthropic servers, which suits long tasks that do not require a personal computer to remain active continuously. In return, the level of customization and control over the execution layer is not as broad as in a self hosted project. Cowork is better suited to people who want quick results within the Claude ecosystem and do not want to maintain a server or design a Gateway themselves. How the memory of the three tools works differently Memory in an agent should not be understood simply as storing every conversation. A useful system must know which information is worth retaining, which information matters only in the current session, and when old data should be retrieved. If it stores too little, the agent must ask the same questions repeatedly. If it stores too much, costs will certainly increase and sensitive data can easily be used in the wrong context. Hermes stands out by combining persistent memory with skills that can improve. Memory records preferences and context, while a skill records how to complete a type of task. These two layers make the agent feel as if it increasingly understands the user, but quality still depends on whether the user reviews what has been stored and removes processes that are no longer appropriate. OpenClaw runs across several channels at once, and that is also its most complicated aspect. Remembering conversation content is only one part of the problem. The harder issue is distinguishing who is speaking, which channel they are using, and which scope the work belongs to. A command sent in a company Slack group should not automatically pull in private context previously discussed on Telegram. If session configuration and identity policies should be established clearly from the beginning, even a strong model cannot rescue a system when everything remains ambiguous. Cowork limits context to each work session, reads only the files for which you grant access, and uses only the connections you allow. For people who are not accustomed to building systems, this approach is easier to control because the boundaries of each task are relatively clear. However, clear boundaries do not mean automatic understanding. You still need to explain what you want, what completion should look like, and where the data should come from. Cowork cannot infer your company context unless you actively provide it. Which type of work each tool automates best Hermes includes web tools, terminal access, MCP, scheduled runs, and subagents. MCP is a connection standard that helps an agent communicate with external data sources or applications through a consistent interface. By combining MCP with skills, users can turn an experiment into a repeatable process, such as collecting data each morning, analyzing changes, and sending a summary. OpenClaw is strong at workflows that begin with a message or an event. For example, a user can send an invoice to a private channel, after which the agent extracts the information and updates a storage system. Another example is receiving a service alert, gathering additional diagnostic data, and returning a summary directly to the operations group. Its value comes from reducing the gap between the moment a need appears and the moment the agent begins acting. Cowork suits structured office outputs. It can research a topic, synthesize data, create a document, and continue revising it according to feedback. Long running or scheduled tasks help Cowork move beyond short question and answer interactions. Even so, organizations need to inspect each connector and its access permissions before allowing the agent to work with real data stores. When deep integration with private infrastructure is required, Hermes and OpenClaw generally provide more room. When the priority is reducing the time from a request to a finished document, Cowork usually has an advantage. This is the difference between a platform intended for assembly and a product that has already been packaged. How secure are these three AI agents? There is no simple answer to the question of which one is safer because the security risks of each tool come from completely different areas. Hermes Agent: Self hosting does not automatically mean safety. The greatest risk comes from automatically generated skills because, in essence, they are pieces of code that the agent writes and then runs by itself. If they are not reviewed before scheduled execution, a skill with terminal access or permission to send data externally can do things without your knowledge. In addition, API keys and sensitive folders should not appear in prompts or be mounted directly into a sandbox when the skill does not actually need them. OpenClaw: The more channels you connect, the wider the attack surface becomes. The point most easily overlooked is sender authentication. If the Gateway trusts only a display name or a channel that has not been properly secured, a compromised messaging account may be enough for someone to issue commands to your agent. The list of people allowed to send commands and the permissions of each bot need to be reviewed whenever you add a new channel. Claude Cowork: The most concerning risk is prompt injection, which occurs when the agent reads a document or webpage containing hidden instructions intended to redirect it away from your original request. Anthropic provides safeguards and asks for confirmation before sensitive actions, but those measures do not replace your own review of the results or the need to avoid granting broader permissions than the task actually requires. Note: With any agent, do not grant permission to delete files, send external messages, or perform sensitive transactions. Start with read only mode, enable complete logging, and retain human approval for actions that require human judgment. Should you choose Hermes Agent, OpenClaw, or Claude Cowork? Every tool has its own strengths and weaknesses, so selecting the most suitable one depends on the user and the work that needs to be done. Choose Hermes Agent when you want the agent to understand how you work increasingly well Hermes suits developers, researchers, and technical teams that want an agent to learn their own processes and run on flexible infrastructure. It is particularly worth considering when tasks recur often enough for skills to create accumulated value. You need to be prepared to read logs, review skills, and manage execution environments. Best suited when: You want the agent to remember and improve work processes through repeated use. You can manage sandboxes, select models, and control access permissions yourself. Choose OpenClaw when work requires continuous communication through messages OpenClaw is suitable when the assistant needs to be present on Telegram, WhatsApp, Slack, Zalo, or similar channels. It is useful for alerts, rapid collection of requests, and automation that begins with a conversation. In return, you must manage identity, channel permissions, and Gateway stability. Best suited when: Requests usually arrive as messages or automated alerts. You need one coordination point for several different communication channels. Choose Claude Cowork when you need quick results without building a system Cowork suits content creators, analysts, and managers who need complete documents, spreadsheets, and slides without wanting to think about servers or Gateways. In return, you should understand the limits of your plan, where data travels, and which connections are enabled before introducing real work. Best suited when: You want to describe the required outcome in natural language and receive a complete output. You prioritize the convenience of a managed service over full control of the infrastructure.

Nam•

14 Jul, 2026

GPT-5.6 vs Claude Fable 5: What Is New?

Sol, Terra, and Luna make GPT-5.6 look more like a product family than a single model. The naming also signals what OpenAI is trying to change: users no longer have to choose only between an expensive flagship and a much smaller model. Instead, they get three capability tiers designed for different workloads. The important caveat is that GPT-5.6 is currently in limited preview, and OpenAI says it is not available in ChatGPT during this preview period.On the other side, Anthropic positions Claude Fable 5 as a frontier model for reasoning, software engineering, scientific research, and long horizon agentic work. The useful question is therefore not simply which model is smarter. It is which product architecture helps a team complete real work with predictable quality, latency, and cost.What GPT-5.6 actually isAccording to OpenAI's preview announcement, GPT-5.6 consists of Sol, Terra, and Luna. Sol is the flagship and most capable option, Terra is a strong lower cost model, and Luna is the fastest and most cost efficient member of the family.The important change is how OpenAI divides demand into three tiers. A research team might use Sol for a difficult reasoning problem, a product team might run most daily work on Terra, and a high volume system might use Luna for thousands of short requests. This looks more like an infrastructure strategy than the launch of a single new chatbot.Availability matters: OpenAI says GPT-5.6 is not available in ChatGPT during the preview. An experience in an API, developer tool, or partner platform should not be treated as the final ChatGPT experience.Sol is designed for difficult, extended workSol is positioned as the strongest GPT-5.6 model for deep reasoning, complex coding, and long multi step tasks. A software team might ask it to understand a repository, identify the cause of a bug, propose a minimal patch, and write regression tests. Sol's value is not answering a short question quickly. It is maintaining the objective while working through a longer chain of decisions.OpenAI also highlights stronger cyber capability as reasoning increases. That can be useful for authorized security testing and vulnerability analysis, but it also makes access controls, logging, sandboxing, and human approval more important.Terra aims for the practical middleTerra targets the broadest category of work: document analysis, content production, application development, research synthesis, and operational support. If Sol is the specialist called for the hardest problem, Terra is the strong team member expected to work throughout the day without making every request unnecessarily expensive.A marketing team could use Terra to read market reports, extract insights, build an outline, and draft several content variants. A development team could use it for code review, test generation, and tickets with a clear scope. This tier could become the default if its real world quality remains consistent.Luna prioritizes speed and scaleLuna is designed for low latency and lower cost. Classification, conversation summaries, field extraction, drafting, and ticket routing do not always require the strongest model. In these cases, response time and total operating cost matter more than maximum reasoning capability.Fast does not mean suitable for everything. If a task requires source verification, a long plan, or a code change with a large blast radius, a team should move it to Terra or Sol instead of forcing Luna beyond its intended role.Claude Fable 5 takes a different routeAnthropic presents Claude Fable 5 as a frontier model for reasoning, software engineering, vision, scientific research, and long horizon agentic work. Instead of emphasizing three product tiers in one generation, Anthropic's message focuses on the capability of a powerful model working inside the Claude ecosystem.This difference changes deployment decisions. With GPT-5.6, an engineering team might build a router that sends each request to Sol, Terra, or Luna. With Fable 5, the focus may be on optimizing prompts, tools, context, and reasoning budgets around one primary model. Neither approach is universally better because the answer depends on workload and operational maturity.A fair comparison: Do not run one prompt and declare a winner. Build a test set covering short tasks, long reasoning, coding, extraction, and recovery from errors. Measure accuracy, latency, the number of human corrections, and the total cost of a completed task.Coding and agentic work depend on the surrounding toolsBoth GPT-5.6 Sol and Claude Fable 5 target complex software work, but the practical experience depends heavily on the system around the model. The ability to read a repository, execute commands, observe results, and correct mistakes can matter as much as a benchmark score. For OpenAI workflows, the Codex page is a useful starting point for understanding how a model participates in coding work.Fable 5 may be attractive to teams already invested in Claude and long running agentic workflows. Read our Claude Fable 5 coverage for more context on Anthropic's positioning and the types of work it targets.What early forum experience tells usEarly discussions on Reddit and developer communities focus on how different Sol, Terra, and Luna feel in real work. Some users describe Sol as the better fit for multi step tasks, Terra as the practical option for routine work, and Luna as the interesting choice for speed. These observations match OpenAI's positioning, but they do not establish a precise quality gap.Forum reports are useful because they reveal the questions real users care about. However, they are self selected evidence. People may use different prompts, access levels, integrations, and preview versions. A result from a developer platform does not guarantee the same result when a model eventually appears in ChatGPT.Early positivesThe three tiers make it easier to understand which model belongs to which workload.Luna creates a clear expectation of low latency for high volume systems.Terra could become a default if it delivers stable quality at a practical cost.Sol is expected to be stronger for coding, long reasoning, and tasks with several verification steps.Open questionsHow large the practical quality gap between Sol and Terra will be on common workloads.The total cost after retries, corrections, and human review are included.How Luna behaves with long prompts and many constraints.Whether performance remains stable as GPT-5.6 expands beyond preview access.Forum reports are not benchmarks: Community experience should help you choose test cases, not make a production purchasing decision by itself.Comparing GPT-5.6 and Fable 5 by workloadWriting and document analysisTerra appears positioned for most document work because it balances capability and cost. Fable 5 may be attractive when documents are long, questions are complex, and the model must maintain an argument across a large context. A useful evaluation should score citation accuracy, structural consistency, and how much editing is required before publication.Software development and debuggingSol and Fable 5 are both candidates for difficult coding tasks. A representative test should include reading existing code, identifying the root cause, producing a minimal fix, writing tests, and explaining risk. Asking a model to create an isolated function from scratch does not reflect how well it works in a real repository.High volume processingLuna has the clearest positioning advantage when speed and cost dominate. At thousands of extraction or classification requests per day, a small difference in price and latency can have a large effect. Fable 5 may be unnecessarily expensive for a workload that only needs short, structured outputs.Research and long reasoningSol and Fable 5 should be compared with tasks that have verifiable outcomes rather than open questions that merely sound impressive. Give both models the same research material and ask them to identify assumptions, detect contradictions, propose an experiment, and explain what evidence is missing. The better model is the one that helps users discover errors faster, not the one that writes the longest answer.Should you choose Sol, Terra, Luna, or Fable 5?If you want maximum capability inside the OpenAI ecosystem, Sol is the first model to test. If you need a strong model for regular use, Terra has the more practical position. If your workload contains many short and repetitive tasks, Luna could reduce operating cost. Fable 5 remains relevant for teams invested in Claude or focused on long reasoning and agentic work.Because GPT-5.6 is still in preview, replacing an entire production workload would be premature. Run the models in parallel on real but sanitized data, record failures, and use the same criteria for every candidate.A test plan you can use nowSelect 20 tasks that represent real work, including easy and difficult cases.Run each task on Sol, Terra, Luna, and Fable 5 when access allows.Score accuracy, response time, total cost, and required human correction.Track severe failures separately instead of relying only on averages.Choose a model for each workload category rather than forcing one model to do everything.Is GPT-5.6 worth switching to now?The most important change in GPT-5.6 may not be Sol's raw capability. It is OpenAI's decision to turn one model generation into three operational tiers. That could help organizations control cost, but only if they can classify workloads and route requests intelligently.The practical next step is to build a small benchmark from your own data. If Sol wins difficult tasks, Terra is good enough for routine work, and Luna handles high volume requests reliably, the three tier architecture has real value. If Fable 5 remains more consistent on long reasoning, a multi model strategy may still be better than committing to one provider.

Liên•

9 Jul, 2026

Anthropic launches the highly powerful Claude Fable 5 model

Anthropic just dropped what may be its biggest release yet with Claude Fable 5, and it has quickly become the most talked-about model this week. Not just because of its raw power, but because of how Anthropic brought it to the world: this is the first time a Mythos-class model has been made available to general users, after two months under lock and key for safety reasons. What is Fable 5 and why is it different from previous models? At its core, Fable 5 is not a model built from scratch. It is a "safety-hardened" version of Mythos 5, the most powerful model Anthropic has ever built. Back in April 2026, Mythos Preview was only accessible to a very small group of organizations including AWS, Apple, Google, Cisco, and JPMorgan Chase through Project Glasswing, because its ability to detect and exploit software vulnerabilities was simply too powerful to release broadly. Anthropic had also launched Claude Opus 4.8 beforehand as a stepping stone in the development roadmap toward this new model generation. To get Mythos out the door, Anthropic spent two more months building classifiers running in parallel. These are specialized AI systems that analyze requests before the main model processes them, and when a sensitive topic is detected, the system automatically routes to Claude Opus 4.8 at no additional charge. Anthropic says this mechanism only activates in fewer than 5% of sessions, meaning most general users will notice no difference compared to raw Mythos 5. Fable 5 and Mythos 5 share the same pricing: $10 per million input tokens and $50 per million output tokens, which is less than half the cost of Mythos Preview. Users on Pro, Max, Team, and Enterprise plans can use Fable 5 for free through June 22, 2026. Starting June 23, Anthropic will shift to consumption-based billing until infrastructure capacity allows the model to return to fixed subscription plans. How does Fable 5 differ from Mythos 5 on safety? Despite sharing the same underlying model, Fable 5 and Mythos 5 are two distinct products by design. The difference lies entirely in the safety classifiers layered on top of the base model. Three classifiers Fable 5 has that Mythos 5 does not Fable 5 is equipped with three safety classification layers running alongside the main model, covering: Cybersecurity, Biology and Chemistry, and Distillation. When a user submits a request in any of these areas, Fable 5 automatically falls back to Claude Opus 4.8 instead of the main model, and notifies the user accordingly. Mythos 5 has none of these filters. It retains the full software exploitation and biological research capabilities that Anthropic considers too dangerous for wide distribution, which is why Mythos 5 remains restricted to a limited group within Project Glasswing, including vetted cybersecurity professionals, critical infrastructure organizations, and approved biology researchers. How does this affect real-world performance? The classifier difference leads to meaningfully different benchmark results in specialized tasks. On ExploitBench, a benchmark focused on cybersecurity, Mythos 5 scores 78% while Fable 5 lands near the 40% range of Opus 4.8, because the fallback mechanism triggers as soon as it detects attack-related requests. For scientific research, Mythos 5 can design proteins and generate novel hypotheses at roughly 10 times the speed of previous methods, while those same capabilities are restricted in Fable 5 for safety reasons. If you are a researcher or work in legitimate cybersecurity, be aware that Fable 5 may automatically redirect some of your requests to Opus 4.8, even when the context is entirely valid. Anthropic acknowledges this and is actively working to improve classifier accuracy. Real-world performance: what do the numbers say? On SWE-Bench Pro for coding tasks, Fable 5 scores 80.3%, compared to 69.2% for Opus 4.8 and 58.6% for GPT-5.5. But perhaps the more striking number comes from a real deployment: Stripe used Fable 5 to migrate an entire 50-million-line Ruby codebase in a single day, a task that would have taken a full engineering team more than two months to complete manually. On business analytics, Fable 5 is the first model to cross the 90% threshold on Hex's complex analytics benchmark, outperforming Opus 4.8 by 10 percentage points. IMC, a quantitative trading firm, reported that the model scored near-perfect on their internal evaluation covering fact lookup, causal reasoning, and expected value calculations. The biggest shift from previous models is the ability to sustain focus across multi-day tasks without needing human oversight at every step. Rather than executing commands one at a time, Fable 5 can take on a large project, self-plan, run tests, and handle errors in a loop, behaving far more like an engineer than a question-answering tool. Fable 5 is now available on the Claude API under the model ID claude-fable-5, with support on Amazon Bedrock and Google Vertex AI for enterprise consumption-based plans. Notion integrates Fable 5: from scattered notes to a complete action plan Notion is one of the first applications to integrate Fable 5, and the reason is straightforward. The tasks Fable 5 handles best, specifically reading multiple fragmented data sources, synthesizing them, and producing a logical structure, are exactly what Notion users need most in their daily work. Simon Last, co-founder of Notion, described the primary use case as turning messy meeting notes into a task board with assignments and priorities. Instead of users having to re-read entire transcripts, summarize, and manually create tasks, Fable 5 handles the entire chain without needing to be prompted at each step. There has been no official announcement from Notion about Fable 5 pricing after June 22. It remains to be seen whether Notion AI will pass the consumption cost directly to users or absorb it into existing subscription tiers. If the rate ends up lower than going directly through Anthropic, that would be a meaningful advantage for Notion subscribers. A few things to keep in mind before diving in Fable 5 is powerful, but there are two things worth considering before building it into your workflow. First, the $50 per million output tokens price point is high relative to the current market, making it well-suited for complex engineering or analytical tasks but not necessarily for simpler jobs that Sonnet or Haiku can handle at a fraction of the cost. Second, the safety classifiers work well in the vast majority of cases but can trigger incorrectly in some legitimate research contexts, something Anthropic openly acknowledges and is continuing to refine. For individual users on Pro or Max plans, the remaining days before June 22 are a reasonable window to evaluate whether Fable 5 actually generates enough value at that price point before committing to pay-per-use billing.

Nam•

10 Jun, 2026

Hermes Agent and MCP: Automate Real Workflows

An AI agent may plan extremely well, yet it still cannot update Notion, read GitHub issues, or retrieve reports from Google Drive without the right connection. By combining Hermes Agent with MCP, users can turn a conversation into a practical workflow while clearly controlling which tools and permissions the agent may use. If you are not yet familiar with Hermes memory and its ability to create skills, our guide to what Hermes Agent is provides the necessary foundation. This article focuses on how MCP extends Hermes beyond the terminal so it can work with everyday data and services. What does MCP add to Hermes Agent? MCP is a connection standard between an AI application and a server that provides tools or data. It can be understood as an adapter layer: Hermes remains the agent responsible for understanding the goal and choosing the next step, while each MCP server contributes specific actions such as searching Notion, reading a pull request, creating an issue, or querying files. According to the Hermes Agent MCP documentation, Hermes supports local servers over stdio and remote servers over HTTP. At startup or after a configuration reload, Hermes discovers the tools exposed by each server and registers them in its normal tool system. Users therefore do not need to write a native Hermes tool for every service that already has a suitable MCP server. MCP does not automatically make a workflow safe. A server may expose tools that read, write, create, and delete data. Hermes supports filtering per server, allowing users to enable only the operations they need instead of exposing every capability to the model. How to connect MCP without granting excessive access The standard Hermes installation already includes MCP support. Users can open the picker with hermes mcp, view the catalog with hermes mcp catalog, and test a connection with hermes mcp test. Nous Research reviews entries before they enter the Hermes catalog, but its documentation still recommends reading the manifest, source repository, and installation commands before use. For a server outside the catalog, users can add an HTTP connection or a stdio command to config.yaml. After completing OAuth or configuring the required environment variables, reload MCP and ask Hermes to list the available tools. This simple check reveals servers that failed to connect or tools that were accidentally filtered out. Begin with read access The safest setup is to connect one server, enable read only tools, and test with nonsensitive data. Add create or update permissions only after results are stable. Deletion, sharing changes, and outbound publishing should require human approval. Notion initially needs only search and page reading access. GitHub can be limited to reading repositories, issues, and pull requests. Google Drive access should be limited by folder, account, and required OAuth scope. Three practical workflows with Notion, GitHub, and Google Drive Turn Notion into a knowledge center The official Notion MCP allows an agent to search, read, and update workspace content under the authenticated user's permissions. A useful workflow lets Hermes collect meeting notes, find relevant decisions, and prepare a summary on the project page. Hermes can create a draft first so a user can review it before updating status or assigning work. Notion MCP uses user based OAuth, so it does not fit every unattended process. For scheduled automation, verify how the server maintains authentication and avoid designing a workflow around operations that OAuth cannot support in a headless environment. Coordinate development work through GitHub The GitHub MCP Server is provided and maintained by GitHub, allowing AI tools to work with software development data according to account permissions. Hermes can read new issues, compare them with repository changes, and draft a progress report. It can then prepare issue text or release notes while waiting for an owner to approve the write operation. This workflow works best with clear criteria. For example, Hermes can summarize only pull requests merged during the previous seven days, group them by label, and connect each change to its related issue. A second MCP server can then send the result to Notion as a weekly report. Summarize files and reports from Google Drive With a compatible Google Workspace MCP server, Hermes can find Drive files, read permitted content, and feed data into a reporting process. For example, the agent can locate a sales report in a fixed folder, extract selected metrics, and create a summary for Notion or a GitHub issue. Google collects its official MCP projects in the Google MCP repository, including a path for Google Workspace integration. However, several community Drive servers have different maintenance histories. Check the source, update history, and OAuth scopes of the specific server instead of installing one based only on its name. Combine multiple MCP servers into a controlled workflow A complete workflow can begin in GitHub, use Drive as a data source, and finish in Notion. Hermes reads an issue labeled for reporting, finds the corresponding spreadsheet in Drive, produces a summary, and updates the project page. Each stage uses a different MCP tool group, while Hermes plans the sequence and passes results between stages. Do not enable parallel execution merely because a server supports it. Hermes documentation allows servers to declare parallel tool support but warns that operations reading and writing shared state can conflict. Independent read operations may run together, while Notion updates, issue creation, and file changes should remain sequential. Important: An MCP server is software that can run commands and receive credentials. Install only trusted servers, keep tokens out of prompts, filter dangerous tools, and require approval for deleting, sharing, or publishing data. How should you start the first workflow? Do not connect Notion, GitHub, and Google Drive on the same day and immediately assign a critical process. Choose one input, one output, and one completion criterion that is easy to verify. A first workflow could read closed GitHub issues and create a draft report in Notion without deletion or publishing permissions. After several stable runs, you can turn the procedure into a reusable Hermes skill and add a schedule. The real value of MCP is not the number of connected servers. It is the ability to complete a recurring workflow with a small permission surface, verifiable results, and a clear data path.

Nam•

16 Jul, 2026

Quick Summary

What is Claude Opus 4.8?

Key new features

System messages during a conversation

Fast mode for Claude API

Prompt caching becomes easier to use

Documented refusal stop details

API constraints to watch

Why it matters for coding agents and long workflows

Should you upgrade to Claude Opus 4.8?

Conclusion

Discussion (0)

Related Articles

Comparing Hermes Agent, OpenClaw, and Claude Cowork

GPT-5.6 vs Claude Fable 5: What Is New?

Anthropic launches the highly powerful Claude Fable 5 model

Hermes Agent and MCP: Automate Real Workflows