4AIVN
Back to News

What Is Hermes Agent? Nous Research's Self-Learning AI

Published on 19 June, 2026
What Is Hermes Agent? Nous Research's Self-Learning AI

Quick Summary

Hermes Agent is a AI Agent open source autonomous AI platform developed by Nous Research, shifting from passive chatbots to active, self improving assistants. By introducing persistent memory that learns from projects and automatically creates new skills, it solves the context limitation of traditional session based chatbots. The system features cross platform integration from Telegram to Discord, natural language scheduling for automated tasks, and task delegation through isolated subagent architectures. Additionally, its robust multi layer sandbox environment ensures secure local deployment, empowering users to host and retain complete control over their personal data.

Learning more makes you better, a principle long assumed to apply only to humans, turns out to hold true for Hermes Agent too, an open-source AI agent from Nous Research. Every time you work with it, Hermes Agent doesn't forget, it remembers, understands you more deeply, and gets better with each session, thanks to a memory system that can recall everything about you even after the machine has been off for a week.

What Is Hermes Agent?

Hermes Agent is an open-source AI agent developed and released under the MIT license by Nous Research, the lab behind the Hermes, Nomos, and Psyche model lines. Unlike Antigravity or Codex, which depend on an IDE environment, or ordinary chatbots that ultimately remain a thin wrapper calling a single API, Hermes Agent is built to run continuously on a user's own infrastructure, from a cheap VPS to a GPU cluster or serverless infrastructure, and it operates in a way fairly similar to OpenClaw.

The core difference in Hermes Agent lies in how it manages long-term memory and converts experience into real skills. Instead of merely storing raw information or passively remembering preferences the way AI like Gemini or Claude do, Hermes runs a closed "learning loop," meaning that after every work session, it actively distills the process into new tools it can use the next time. This system is run by a background "Curator" agent that automatically scores, prunes, and merges accumulated knowledge, combined with FTS5 search technology that retrieves old memories roughly 4,500 times faster without spending any tokens. As a result, Hermes doesn't just respond and forget, it genuinely becomes a collaborator that grows more knowledgeable and capable over time.

Four Features That Set Hermes Agent Apart

Nous Research doesn't call Hermes Agent a chatbot or a copilot, it positions it as an agent with a built-in learning loop. The four feature groups below explain why that label isn't just marketing.

Memory That Persists Across Sessions

The biggest weakness of most AI today is that memory only stores raw chat text rather than how work actually gets done. Hermes Agent addresses this through three combined mechanisms:

  • Fast retrieval: Uses FTS5 full-text search to pull up old memories roughly 4,500 times faster than conventional search, without spending extra tokens the way Gemini or Cowork do.
  • User understanding: Integrates Honcho's dialectical user-modeling approach, helping the agent understand preferences, habits, and personal context in depth across thousands of sessions.
  • Continuity: The agent picks up work exactly where you left off, even if that was a project from weeks earlier.

Self-Generating and Self-Improving Skills

This is the feature that makes Hermes Agent behave like a collaborator that accumulates experience, rather than just a tool that answers on request:

  • Learning from real use: After completing complex tasks, Hermes Agent distills the process into new skills and stores them in a library to be reused automatically next time.
  • Open agentskills.io standard: These skills follow an open standard, so they can be packaged, shared, and reused across different AI systems without being rewritten from scratch.
  • The Curator mechanism: A background administrative agent periodically scores, prunes, and merges duplicate skills, which keeps the skill library from bloating and becoming disorganized over time.

Present on More Than 23 Messaging Platforms

Hermes Agent isn't confined to a computer, it integrates directly into the messaging channels people already use on their phones every day:

  • Multiple channels, one brain: You can command Hermes Agent through Telegram, Discord, Slack, WhatsApp, Signal, email, or SMS.
  • Context retained: Whether you message via Telegram in the morning or switch to Discord at night, the agent keeps a single thread of memory, never fragmented by channel.
  • Multimodal interaction: Supports sending voice messages, images, and video, along with the ability to analyze multimodal content.

Flexible Runtime Infrastructure

Hermes Agent supports six backend types for executing commands: local machine, Docker, SSH, Daytona, Singularity, and Modal. With Daytona and Modal, the environment can hibernate when idle and cost almost nothing while waiting, waking up only when there's work to process. This is why Nous Research describes Hermes Agent as an always-on agent that doesn't require users to keep a server running 24/7 at high cost year-round.

Overview of the Hermes Agent desktop app interface
Hermes Agent, Nous Research's self-learning AI agent

Built-In Toolset and Limitations to Know

40-Plus Built-In Tools, From Web Search to Schedule Automation

Hermes Agent ships with more than 40 built-in tools, including web search, browser actions, file handling, and Python script execution via RPC to run sub-tasks without consuming the main agent's context window. A natural-language scheduling system lets you set recurring tasks like daily reports or data backups, then leaves the agent to run them without being reminded. For tasks that need full isolation, Hermes Agent also supports sub-agents with their own conversation, terminal, and scripts, allowing multiple jobs to run in parallel without diluting the main memory.

Challenges and Security Considerations

Despite rapid updates, Hermes Agent still has a few points users should keep in mind before deploying it:

  • Stability of the self-learning mechanism: The ability to self-improve skills boosts success rates, with a Tencent Cloud report recording gains of up to 52% along with token savings of up to 61%. However, since this is a self-evolving mechanism, real-world effectiveness still depends on the underlying model chosen and still requires human oversight rather than full trust.
  • Risk from high-level permissions, with security responsibility falling on the user: Hermes Agent can intervene deeply in a system (excessive agency), so connecting it directly to multiple messaging platforms requires users to manage their own API keys and set up guardrails. Unlike closed AI services, Hermes Agent hands full control over to the user, which means the user also bears greater responsibility for configuring access permissions to avoid information leaks.

Why Is Hermes Agent Growing So Fast?

Hermes Agent's growth could be attributed to Nous Research's marketing, but in our view it comes down to three main factors.

A Frictionless Migration Path From OpenClaw

Recognizing OpenClaw's large user base, Nous Research built a migration tool that lets users carry over their persona, API keys, the entire skill set, and memory to Hermes Agent with a single command, without losing old data and, of course, without having to reconfigure anything from scratch.

Betting on a Closed Learning Loop Instead of a Feature Race

While many other agents compete on the number of tools they offer, Hermes Agent positions itself as a self-evolving entity, one that distills experience into new skills and retains long-term memory to understand users more deeply over time. This approach creates lasting value, and the community has already put it to use for projects such as automating large-scale content production with high consistency across many sessions.

A Role as a Training Data Engine

Beyond serving as a personal assistant, Hermes Agent also functions as a capable research tool. It can generate thousands of parallel tool-calling trajectories and compress them into training data for other AI models. By turning the agent's real-world experience into training data, Hermes becomes a platform that developers building the next generation of autonomous AI can't easily do without.

How Is Hermes Agent Different From an Agent Harness?

People new to the space often confuse Hermes Agent with the concept of an agent harness, which is the framework that decides how a model calls tools, handles the reasoning loop, and coordinates execution steps internally. If a harness is the engine and chassis that determine how a car drives, then Hermes Agent is like a car that already has that engine installed, plus seats, a navigation system, and the driver's own trip memory.

In other words, a harness is the technical architecture layer underneath, while Hermes Agent is a complete end-user product that already packages memory, a skill system, communication channels, and a choice of runtime infrastructure. A developer can build their own harness to control every small detail, but most users don't need to go that deep, they just need an agent that runs right away and gets smarter through use. For a closer look at this underlying architecture layer, read more at What Is Agent Harness? The Framework That Makes AI Work Efficiently, which explains in detail how this type of framework operates.

Is Hermes Agent Worth Trying Right Now?

Being fully open source, collecting no user data, and supporting complete self-hosting, Hermes Agent is one of the few agents today that lets users keep full control over their own data while still getting a continuous assistant experience with real memory, not the simulated memory that only exists within a single chat. After v0.16.0, the biggest technical barrier for users unfamiliar with terminals has largely been removed, as the native desktop app for Windows, macOS, and Linux has fully replaced the pure CLI approach used before.

What's left to judge about Hermes Agent isn't whether it runs, but what it learns after a few real weeks of use. The fastest way to find out is to install the desktop app or run the CLI on a cheap VPS, connect it to a familiar messaging channel like Telegram, then watch what skills the agent forms on its own from how you use it every day. That's also the groundwork for comparing Hermes Agent with other options on the market, from Agent Harness to OpenClaw and Claude Cowork, in the next part of this series.

Discussion (0)

Log in to join the discussion.

No comments yet. Be the first!

Related Articles

Gemini powers Argentina and Messi at World Cup 2026

Gemini has won big in the most literal sense, right as Messi scored his first hat-trick at the 2026 World Cup, leading Argentina to a crushing 3-0 victory over Algeria and equaling Miroslav Klose's record of 16 World Cup goals. That historic moment became the perfect launchpad for Gemini. Back in March 2026, Google and the Argentine Football Association (AFA) made a bold decision: rather than simply printing a logo on training kits, they signed a deal for the AI to actively support tactical preparation and professional decision-making. That bet has now proven to be the right call. From training kit to the tactical meeting room The agreement between AFA and Google was unveiled at Times Square, New York, a venue deliberately chosen to capture global media attention. The Gemini logo appears across all training apparel for Argentina's men's, women's and youth squads, sitting alongside Adidas and American Express in AFA's top sponsorship tier. But the interesting part isn't the jersey. According to Inside World Football, Argentina's coaching staff will use Gemini for three specific purposes: tactical analysis, injury prevention and decision support. In other words, Gemini now has a seat in meetings that previously belonged only to Scaloni and his assistants. Google has not publicly disclosed which specific Gemini tools have been integrated into AFA's workflow. What is clear is that they are using the World Cup to bring Gemini into the reality of professional football, and the results will be graded in public. What is Gemini actually doing in the dressing room? Argentina arrives at the 2026 World Cup as the reigning champion. Every decision Scaloni makes, from the squad list to the starting eleven, is scrutinized more closely than any other team, and that is precisely why Argentina has become the most ideal testing ground Google has ever had for Gemini in professional football, especially at a major tournament. Tactical analysis Gemini is used to process match data for both Argentina and their opponents, covering movement statistics, attacking patterns and defensive vulnerabilities. Instead of the coaching staff spending hours reviewing footage, AI synthesizes the data and generates tactical diagrams automatically, saving significant preparation time before each match. Injury prevention This is a problem every major team wants to solve, especially when Messi and several key players are at an age that requires careful management of training loads. Gemini analyzes biometric data and injury history to issue early warnings, helping the coaching staff adjust intensity before problems actually occur. That is part of the reason why, immediately after completing his hat-trick, Scaloni chose to substitute Messi off, prioritizing fitness and safety for the matches ahead. AI in injury prevention is nothing new. Premier League clubs have had Microsoft as a partner for similar purposes. What is different this time is that Gemini is integrated directly into the workflow of a national team competing at a major tournament, not just at club level. For fans: create Messi content, follow scores without unlocking your screen Alongside supporting the coaching staff, Gemini has also rolled out a range of features aimed at fans, and this is the side that hundreds of millions of people will actually experience. Gemini lets you create content about players directly Users can generate images, songs and digital content featuring Argentina players like Messi directly inside the Gemini app. The feature is designed to bring the World Cup experience closer to those who cannot attend matches in person. Real-time scores and automated daily briefings On Google Search, live match scores can be pinned to the lock screen and update in real time, with dedicated animations for goals and red cards, all without needing to unlock the phone. For paid Gemini users, the Scheduled Actions feature allows an automated daily football briefing to be set up, covering scores, news and fixtures, delivered at a chosen time without needing to prompt it each day. Match-day infrastructure Google has updated Street View at all 16 host stadiums and optimized routing on Waze for match days. Waze also surfaces live scores when the car is stopped at red lights, so drivers do not need to pick up their phones while on the move. The 2026 World Cup is the real test for AI in sport Google is not sponsoring Argentina alone. Gemini also appears on the kits of France, Morocco, Iraq, Turkey and the United States, while Pixel is the official phone of the French squad, which is also using Gemini for internal communications. This is clearly a comprehensive strategy from Google, not a one-off deal. What makes the 2026 World Cup particularly significant is that it will answer a question no lab environment can: what do users actually do with AI when a World Cup runs for six weeks across 104 matches? Features that run on initial novelty will fade after the group stage. Whatever users keep coming back to all the way through the final is the honest answer to where AI actually fits in everyday life, and Google knows it. Google's communications director for Latin America, Flor Sabatini, stated that the 2026 World Cup will mark a before and after in the history of football because of AI. It sounds like marketing, but the reality is that this is the first time a major AI model has been integrated into the preparation of the reigning world champions, right in the middle of the most-watched sporting event on the planet. The 2026 World Cup is Gemini's real test The most significant part of this entire story is not the Gemini logo on Messi's jersey. It is the fact that Argentina, still the most expected to win and the most scrutinized team, carrying the pressure of defending the title, has committed part of its preparation process to AI. If Argentina succeeds, Gemini will have a case study that no advertising budget can buy. If Argentina falls short and the coaching staff attributes any part of it to AI, the narrative will flip entirely. Either way, this is the first time AI has been held accountable on a stage that genuinely matters, not a benchmark, not a demo, but the World Cup. For AI users, what is worth watching is not just whether Argentina wins, but whether Gemini actually changes how a football team operates, or whether it turns out to be nothing more than a logo on a training kit that looks better than previous years.

Nam
17 Jun, 2026
AI Technology at World Cup 2026: A Complete Overview

The Adidas Trionda match ball, three dimensional player models accurate to the millimeter, robot dogs patrolling stadiums, and Google Gemini sitting on the touchline with the Argentina national team. World Cup 2026 is not only the largest tournament in history with 104 matches across 16 cities in the United States, Canada, and Mexico, but also the most extensive deployment of AI ever seen in sports. How the Adidas Trionda smart ball works The official match ball named Adidas Trionda is equipped with an Inertial Measurement Unit IMU sensor operating at 500Hz, which means it collects 500 data points every second on movement, spin, and the exact moment the ball makes contact with a player foot. This is particularly important for offside situations, as the sensor will determine the precise moment the ball leaves the passer foot down to the millisecond. The timestamp from the sensor is synchronized immediately with the player tracking system, helping to lock the position of every player on the pitch at that exact moment instead of relying on the naked eye which can be off by up to half a second. As a result, offside decisions are made faster and more accurately than ever before. This advanced technology immediately rescued the Swedish team by identifying the precise moment of contact from striker Alexander Isak. Before that, the joy of scorer Svanberg was temporarily dampened when the VAR team stepped in to review. In a play that occurred at a breakneck speed, he appeared to be standing behind the Tunisian defense when the ball was delivered into the penalty area, leading many to believe the goal would be disallowed. However, the data from the motion sensor mounted inside the Adidas Trionda ball proved that Svanberg moved back to a valid position in time, bringing a legitimate goal for Sweden to the delight of the fans. Semi automated offside technology with 3D player avatars Semi automated offside technology SAOT has been upgraded significantly for World Cup 2026, highlighted by the 3D avatar of each player. Every player participating in the tournament is digitally scanned across the entire body in about one second, creating a 3D model with detailed body dimensions for every part. When a situation requires VAR review, the system overlays these 3D models onto real time tracking data from more than 12 specialized cameras at each stadium. This approach completely resolves the long standing issue of two dimensional offside lines, where a player arm, shoulder, or foot might be obscured from a certain camera angle. The 3D model fills those gaps using realistic anatomical data, and the result is displayed as a complete 3D animation on the pitch and on television, entirely replacing the flat red and green lines that once confused spectators. Football AI Pro: analytics platform for all 48 teams FIFA collaborated with Lenovo to build Football AI Pro, an analytics platform developed on the FIFA Football Language foundation model, which has been trained on hundreds of millions of football data points over decades of competition. This is the first time in World Cup history that all 48 participating teams have access to the same analytics platform, rather than wealthier federations holding an advantage due to better data tools. This platform outputs results in multiple formats, including text summaries, video clips, interactive charts, and 3D tactical visualizations. Teams can use it before and after matches to analyze opponent tactics, detect set piece patterns, track player workload intensity, and analyze head to head history. However, FIFA bans its use during match time, and coaching staff can only access it during halftime and after the match. Referee chest cameras with AI image stabilization For the first time in history, referees in all 104 World Cup matches wear chest cameras. The raw images from the camera when the referee runs at high speeds are shaky and cannot be used for broadcasting, but FIFA runs an AI image stabilization model in real time on every frame, creating broadcast quality video. The result is the Referee View perspective that offers a subjective experience from the pitch, quickly becoming one of the most popular broadcasting innovations. This viewpoint not only serves entertainment but also provides analysts with a new data source, which is the exact vision that the referee had when making decisions. Google Gemini on the touchline and fan experience In March 2026, the Argentine Football Association announced Google as an official global sponsor, with the Gemini logo appearing on training jerseys for the men, women, and youth teams. However, this partnership goes far beyond brand advertising, because the Argentina technical staff uses Gemini directly for tactical analysis from match videos, tracking player workload and injury recovery, querying historical data on specific matchup scenarios, and creating individual opponent briefings for each player. Notably, Argentina players and coaches use Gemini through the standard application rather than any customized interface, reflecting the maturity of general purpose AI tools in professional sports applications. Additionally, Google also deployed a series of features for fans, including live scores pinned to the Android lock screen, AI match summaries on the Gemini app, on demand tactical diagrams, jersey templates on Google Photos, stadium navigation via Google Maps, and match statistics on Google Search. Robot dogs, facial recognition, and AI security At the host venues, FIFA deployed Boston Dynamics Spot robot dogs for outer perimeter security patrols and facility inspections. These robots perform automated patrols in restricted areas, with onboard cameras connected to the stadium security AI system, which is particularly effective in spaces that are difficult to monitor continuously, such as tunnels, underground technical corridors, and stadium perimeters at night. The biometric layer is equally notable, as some stadiums use facial recognition for entry, where your face is your ticket, processed against the database in less than one second. However, the widespread presence of AI surveillance also raises questions about privacy in large scale sporting events. AI predictions for the champion: every model has a different answer Before the tournament kicked off, many AI systems simulated all 104 matches to predict the champion, and the results were completely inconsistent. ChatGPT predicted Spain, the FanDuel research model chose France to defeat Argentina 3 to 2 in the final, while Yahoo Sports and DataCamp both bet on Brazil. This disagreement is worth reflecting on, as every model was provided with the same public data sources including FIFA rankings, ELO scores, qualifying form, and injury reports, but different weighting methods created entirely different results. And of course, no model can calculate Messi left foot shot in the 89th minute of a knockout match. That is still football. AI is no longer an experiment but infrastructure What makes World Cup 2026 different from previous tournaments does not lie in any single technology, but in the fact that AI has transitioned from the experimental phase to operational infrastructure. The smart ball, the 3D offside system, the referee cameras, and the analytics platform are not pilot projects. They are the basic operational foundation for every match. The 500Hz sensor inside the ball does not understand football, as it only measures spin. However, the decision it enables, accurate to the millimeter, displayed in 3D, and returning results in seconds, with the Swedish team situation being a prime example, will change how football is operated. That is the true shape of AI when running at a large scale.

Nam
16 Jun, 2026
Anthropic launches the highly powerful Claude Fable 5 model

Anthropic just dropped what may be its biggest release yet with Claude Fable 5, and it has quickly become the most talked-about model this week. Not just because of its raw power, but because of how Anthropic brought it to the world: this is the first time a Mythos-class model has been made available to general users, after two months under lock and key for safety reasons. What is Fable 5 and why is it different from previous models? At its core, Fable 5 is not a model built from scratch. It is a "safety-hardened" version of Mythos 5, the most powerful model Anthropic has ever built. Back in April 2026, Mythos Preview was only accessible to a very small group of organizations including AWS, Apple, Google, Cisco, and JPMorgan Chase through Project Glasswing, because its ability to detect and exploit software vulnerabilities was simply too powerful to release broadly. Anthropic had also launched Claude Opus 4.8 beforehand as a stepping stone in the development roadmap toward this new model generation. To get Mythos out the door, Anthropic spent two more months building classifiers running in parallel. These are specialized AI systems that analyze requests before the main model processes them, and when a sensitive topic is detected, the system automatically routes to Claude Opus 4.8 at no additional charge. Anthropic says this mechanism only activates in fewer than 5% of sessions, meaning most general users will notice no difference compared to raw Mythos 5. Fable 5 and Mythos 5 share the same pricing: $10 per million input tokens and $50 per million output tokens, which is less than half the cost of Mythos Preview. Users on Pro, Max, Team, and Enterprise plans can use Fable 5 for free through June 22, 2026. Starting June 23, Anthropic will shift to consumption-based billing until infrastructure capacity allows the model to return to fixed subscription plans. How does Fable 5 differ from Mythos 5 on safety? Despite sharing the same underlying model, Fable 5 and Mythos 5 are two distinct products by design. The difference lies entirely in the safety classifiers layered on top of the base model. Three classifiers Fable 5 has that Mythos 5 does not Fable 5 is equipped with three safety classification layers running alongside the main model, covering: Cybersecurity, Biology and Chemistry, and Distillation. When a user submits a request in any of these areas, Fable 5 automatically falls back to Claude Opus 4.8 instead of the main model, and notifies the user accordingly. Mythos 5 has none of these filters. It retains the full software exploitation and biological research capabilities that Anthropic considers too dangerous for wide distribution, which is why Mythos 5 remains restricted to a limited group within Project Glasswing, including vetted cybersecurity professionals, critical infrastructure organizations, and approved biology researchers. How does this affect real-world performance? The classifier difference leads to meaningfully different benchmark results in specialized tasks. On ExploitBench, a benchmark focused on cybersecurity, Mythos 5 scores 78% while Fable 5 lands near the 40% range of Opus 4.8, because the fallback mechanism triggers as soon as it detects attack-related requests. For scientific research, Mythos 5 can design proteins and generate novel hypotheses at roughly 10 times the speed of previous methods, while those same capabilities are restricted in Fable 5 for safety reasons. If you are a researcher or work in legitimate cybersecurity, be aware that Fable 5 may automatically redirect some of your requests to Opus 4.8, even when the context is entirely valid. Anthropic acknowledges this and is actively working to improve classifier accuracy. Real-world performance: what do the numbers say? On SWE-Bench Pro for coding tasks, Fable 5 scores 80.3%, compared to 69.2% for Opus 4.8 and 58.6% for GPT-5.5. But perhaps the more striking number comes from a real deployment: Stripe used Fable 5 to migrate an entire 50-million-line Ruby codebase in a single day, a task that would have taken a full engineering team more than two months to complete manually. On business analytics, Fable 5 is the first model to cross the 90% threshold on Hex's complex analytics benchmark, outperforming Opus 4.8 by 10 percentage points. IMC, a quantitative trading firm, reported that the model scored near-perfect on their internal evaluation covering fact lookup, causal reasoning, and expected value calculations. The biggest shift from previous models is the ability to sustain focus across multi-day tasks without needing human oversight at every step. Rather than executing commands one at a time, Fable 5 can take on a large project, self-plan, run tests, and handle errors in a loop, behaving far more like an engineer than a question-answering tool. Fable 5 is now available on the Claude API under the model ID claude-fable-5, with support on Amazon Bedrock and Google Vertex AI for enterprise consumption-based plans. Notion integrates Fable 5: from scattered notes to a complete action plan Notion is one of the first applications to integrate Fable 5, and the reason is straightforward. The tasks Fable 5 handles best, specifically reading multiple fragmented data sources, synthesizing them, and producing a logical structure, are exactly what Notion users need most in their daily work. Simon Last, co-founder of Notion, described the primary use case as turning messy meeting notes into a task board with assignments and priorities. Instead of users having to re-read entire transcripts, summarize, and manually create tasks, Fable 5 handles the entire chain without needing to be prompted at each step. There has been no official announcement from Notion about Fable 5 pricing after June 22. It remains to be seen whether Notion AI will pass the consumption cost directly to users or absorb it into existing subscription tiers. If the rate ends up lower than going directly through Anthropic, that would be a meaningful advantage for Notion subscribers. A few things to keep in mind before diving in Fable 5 is powerful, but there are two things worth considering before building it into your workflow. First, the $50 per million output tokens price point is high relative to the current market, making it well-suited for complex engineering or analytical tasks but not necessarily for simpler jobs that Sonnet or Haiku can handle at a fraction of the cost. Second, the safety classifiers work well in the vast majority of cases but can trigger incorrectly in some legitimate research contexts, something Anthropic openly acknowledges and is continuing to refine. For individual users on Pro or Max plans, the remaining days before June 22 are a reasonable window to evaluate whether Fable 5 actually generates enough value at that price point before committing to pay-per-use billing.

Nam
10 Jun, 2026
Microsoft launches 7 new AI models to challenge OpenAI

Microsoft just dropped seven new AI models at Build 2026, with MAI-Thinking-1 boasting 35 billion active parameters and trained entirely on clean data. For the first time, the software giant is openly challenging the position of its own strategic partner, OpenAI, on the AI model battlefield. MAI-Thinking-1 and Microsoft's reasoning ambitions The centerpiece of Build 2026 was MAI-Thinking-1, Microsoft's first reasoning AI model developed entirely in-house. With approximately 35 billion active parameters, the model is designed to handle multi-step reasoning tasks, work with long contexts, and support complex coding, all at a lower cost than many large-scale AI models currently available. The most notable claim is that Microsoft trained MAI-Thinking-1 on clean data without using distillation from third-party AI models. In other words, this is a clear statement that Microsoft has the independent AI research capability to build competitive models without "borrowing" knowledge from GPT or any other model. According to Microsoft's published evaluations, MAI-Thinking-1 achieves competitive performance on coding benchmarks and is rated on par with many leading AI models in blind evaluation tests. The 35-billion parameter count also signals that Microsoft is prioritizing efficiency over raw scale, as many competitor models have significantly more parameters but may not necessarily deliver better output quality. From coding to voice: a complete AI ecosystem Beyond reasoning, Microsoft introduced six additional AI models to build a complete AI ecosystem serving both individual users and enterprises. From coding and image generation to voice synthesis, every piece of the puzzle now has a dedicated model. Smarter coding with MAI-Code-1-Flash For developers, MAI-Code-1-Flash is significant news. This model specializes in code generation and software development support, optimized for real-world programming tasks. More importantly, it will be integrated directly into GitHub Copilot and Visual Studio Code, two tools used daily by millions of developers. This means code suggestions and automated coding experiences will be significantly upgraded within familiar development environments. Images and voice: the missing pieces In the creative content space, Microsoft announced MAI-Image-2.5 alongside MAI-Image-2.5-Flash. These are next-generation image creation and editing models, with the Flash version optimized for fast response times, making it suitable for real-time applications like live photo editing or on-demand illustration generation. In the audio domain, Microsoft introduced two important models: MAI-Voice-2 with more natural voice synthesis capabilities and support for additional languages MAI-Transcribe-1.5 for speech-to-text conversion with significantly faster processing speeds than the previous generation Additionally, Microsoft has developed optimized variants specifically for the Microsoft Foundry platform, helping enterprises easily build and deploy their own AI applications. The strategy to reduce OpenAI dependence Where Microsoft was previously seen mainly as an infrastructure partner and deployment platform for OpenAI, Build 2026 shows the company is steadily acquiring all the essential components of a full AI ecosystem. Microsoft now has its own reasoning model, coding model, image generation model, voice synthesis model, and speech recognition model, all connected directly to the Azure, Copilot, and Microsoft Foundry ecosystem. This strategy gives Microsoft greater autonomy in developing core technology while reducing risk from dependence on external partners. More specifically, owning proprietary AI models allows Microsoft to control its product roadmap, optimize operational costs, and customize models for specific service needs without waiting for or negotiating with third parties. Where does the AI model race go from here? The simultaneous launch of seven new AI models shows Microsoft is investing heavily in foundational technologies to compete directly with major players like OpenAI, Google, and Anthropic. When OpenAI's largest partner decides to build its own AI models, that is the clearest signal that the AI race has entered a new phase where no one wants to place the future of their technology in someone else's hands. For developers and enterprises, now is the time to closely watch Microsoft Foundry and the Azure AI ecosystem, as tools that were previously only available through OpenAI will soon appear within Microsoft's familiar ecosystem. Build 2026 may well be remembered as the moment Microsoft officially declared its vision for an independent, comprehensive AI ecosystem with its own distinctive identity.

Nam
4 Jun, 2026