Google ra mắt Gemini 3: Mô hình AI thông minh nhất thế giới, bước tiến mới của Google tới AGI

Published on 19 November, 2025

Quick Summary

Vào ngày 19 tháng 11 năm 2025, Google đã ra mắt Gemini 3, mô hình AI tiên tiến nhất của mình, được CEO Sundar Pichai ca ngợi là tốt nhất thế giới về khả năng hiểu đa phương thức và là một bước tiến tới AGI. Gemini 3 vượt trội đáng kể so với Gemini 2.5 và các đối thủ như Claude 4.5 Sonnet, GPT 5.1 trong các bảng xếp hạng và bài kiểm tra, đặc biệt ở khả năng suy luận cấp độ tiến sĩ và sức mạnh đa phương thức. Mô hình này có thể ứng dụng trong học tập, sáng tạo, phân tích thể thao và được tích hợp chế độ suy nghĩ nâng cao Deep Think, đồng thời đang được triển khai rộng rãi trên hệ sinh thái Google, bao gồm Gemini chat và Google Search.

Ngày 19-11-2025, Google đã chính thức giới thiệu Gemini 3, mô hình AI tiên tiến và thông minh nhất của mình, được thiết kế để giúp người dùng hiện thực hóa mọi ý tưởng.

CEO Sundar Pichai đã tuyên bố Gemini 3 là "mô hình tốt nhất trên thế giới về khả năng hiểu đa phương thức". Mô hình này đánh dấu sự nâng cấp trong hành trình tiến tới trí tuệ nhân tạo tổng quát (AGI).

Sự nâng cấp so với Gemini 2.5 như thế nào

Như vậy sau 8 tháng kể từ khi ra mắt Gemini 2.5 thì Google đã quay lại với Gemini 3 Pro với sự nâng cấp về khả năng suy luận và hiểu ngữ cảnh, nó là sự kết hợp của tất cả các khả năng của các thế hệ Gemini trước lại với nhau.

Càn quét các bảng xếp hạng

Gemini 3 Pro với sự ra mắt có thể nói trong âm thầm không phải là một bước nhảy vọt nhưng vẫn có sức nặng khi đã đứng đầu rất nhiều bảng xếp hạng LLM (như LMArena,...)

Tất nhiên nếu so với Gemini 2.5 thì Gemini 3 hoàn toàn vượt trội ở mọi tiêu chuẩn AI, như ở việc xác định ngữ cảnh và ý định đằng sau yêu cầu của người dùng, cho phép người dùng nhận được kết quả mong muốn với ít đoạn prompting hơn.
Gemini 3 vượt trội so với Gemini thế hệ trước là bình thường nhưng điểm số của nó cũng vượt qua cả Claude 4.5 Sonnet và GPT 5.1 đơn cứ như việc Gemini 3 thể hiện khả năng suy luận ở cấp độ tiến sĩ (PhD-level reasoning) với điểm số cao trên các bài kiểm tra Humanity’s Last Exam là 37.5% không dùng công cụ vượt trội so với Claude Sonnet 4.5 (13.7%) và GPT 5.1 (26.5%) hoặc điểm GPQA Diamond (91.9%) cũng tiếp tục vượt lên với Claude Sonnet 4.5 (83.4%) và GPT 5.1 (88.1%)

So sánh hiệu suất suy luận cấp độ tiến sĩ

(PhD-Level Reasoning)

Nguồn: Dữ liệu từ Google

Sức mạnh đa phương thức (Multimodality)

Gemini 3 vẫn được tiếp nối với Gemini 2.5 ở khả năng tổng hợp thông tin liền mạch trên nhiều phương thức, bao gồm văn bản, hình ảnh, video, âm thanh và mã code. Tất nhiên là với bài kiểm tra đều tốt hơn Gemini 2.5 với 81% điểm MMMU-Pro (Gemini 2.5 là 68%) và 87.6% điểm Video-MMMU (Gemini 2.5 là 83.6% theo Google).

Điểm số của Gemini 3 — Thống kê sức mạnh Gemini 3

Tình huống sử dụng thực tế như thế nào

Sử dụng trong học tập và nghiên cứu: Gemini 3 có thể phân tích các bài báo học thuật hoặc bài giảng video dài và tạo mã code cho các hình ảnh trực quan tương tác hoặc thẻ ghi nhớ nhưng mình đã thử với video dài 4 tiếng thật sự Gemini 3 chế độ Fast sẽ không ghi nhớ được hết sẽ sai hoặc thiếu các chi tiết vì vậy bây giờ chưa nên tin tưởng hoàn toàn vào những thông tin mà Gemini 3 đưa ra mà hãy làm việc đó với Notebook LM.
Trong lĩnh vực sáng tạo và lập kế hoạch: Gemini 3 hoàn toàn có thể phiên dịch và chuyển đổi các công thức nấu ăn viết tay bằng nhiều ngôn ngữ khác nhau thành sách dạy nấu ăn rất thích hợp để chia sẻ. Thậm chí theo Google nó hoàn toàn có thể viết một bài thơ nắm bắt được vật lý học của phản ứng tổng hợp hạt nhân, hoặc viết mã code để tạo hình ảnh trực quan về dòng plasma trong tokamak.
Trong lĩnh vực phân tích video thể thao: Gemini 3 có thể phân tích video về trận đấu thể thao (như pickleball, quần vợt,...) xác định các kĩ năng cần cải thiện và tạo kế hoạch luyện tập.

Gemini 3 Deep Think có chế độ suy nghĩ nâng cao không

Google cũng giới thiệu Deep think mode một chế độ suy luận được tăng cường, để giúp giải quyết các vấn đề phức tạp hơn giống như Gemini 2.5 nhưng thật sự nó cho ra kết quả sẽ rất là lâu.

Chế độ Deep Think đang được thử nghiệm và dự kiến sẽ sớm có mặt cho người dùng đăng ký Google AI Ultra trong những tuần tới vì vậy mình chưa có cơ hội trải nghiệm nhưng với người dùng bình thường thì chế độ Thinking cũng khá phù hợp.

Khả năng cho nhà phát triển và tốc độ triển khai

Khả năng coding Gemini 3 tốt như thế nào

Gemini 3 có sự thể hiện rất tốt trong khả năng tạo mã code và xử lý các prompt phức tạp để tạo ra giao diện web tương tác và phong phú hơn nhưng thật sự vẫn về khả năng coding mình vẫn tin tưởng Claude Sonnet 4.5 hơn, bởi khi Gemini 3 gặp vấn đề với code sẽ không tập trung xử lý vấn đề đó mà càng sửa càng sai không giống như Claude Sonnet 4.5 điều này gây khó khăn so với những người không hiểu nhiều về code.

Về tốc độ, khi sử dụng coding thì Gemini 3 nhanh hơn đáng kể so với Claude Sonnet 4.5 và GPT 5.1 đặc biệt nhanh gấp 2 lần so với Gemini 2.5 đối với các tác vụ nhỏ và trung bình.
Để hỗ trợ phát triển các agent, Google cũng phát hành nền tảng phát triển agentic mới là Google Antigravity sử dụng khả năng suy luận và công cụ của Gemini 3 để biến AI thành một agent mới có khả năng hoạt động độc lập và tích cực.

Bao giờ có thể sử dụng Gemini 3

Gemini 3 đang được triển khai trên toàn bộ hệ sinh thái của Google bắt đầu ngày 19 tháng 11

Ở khung chat Gemini thì Google đã cho chọn chế độ Fast và Thinking và Pro chứ không phải lựa chọn LLM như Gemini 2.5 nữa điều đó cũng sẽ cho thấy việc Google tự động hóa việc lựa chọn LLM cho các tác vụ từ đơn giản phức tạp giống như điều mà Open AI đã làm với GPT-5.1.
Gemini 3 cũng lần đầu được tích hợp luôn trong Google Search với chế độ AI Mode. Chế độ AI này sử dụng Gemini 3 để kích hoạt các trải nghiệm giao diện người dùng tạo sinh (generative UI) mới, chẳng hạn như bố cục hình ảnh sống động và các công cụ tương tác, được tạo ra dựa trên truy vấn của người dùng. Một động thái theo ý kiến cá nhân là để cạnh tranh với Open Atlas ChatGPT Atlas và Perplexity Comet.

Discussion (0)

No comments yet. Be the first!

Gemini powers Argentina and Messi at World Cup 2026

Gemini has won big in the most literal sense, right as Messi scored his first hat-trick at the 2026 World Cup, leading Argentina to a crushing 3-0 victory over Algeria and equaling Miroslav Klose's record of 16 World Cup goals. That historic moment became the perfect launchpad for Gemini. Back in March 2026, Google and the Argentine Football Association (AFA) made a bold decision: rather than simply printing a logo on training kits, they signed a deal for the AI to actively support tactical preparation and professional decision-making. That bet has now proven to be the right call. From training kit to the tactical meeting room The agreement between AFA and Google was unveiled at Times Square, New York, a venue deliberately chosen to capture global media attention. The Gemini logo appears across all training apparel for Argentina's men's, women's and youth squads, sitting alongside Adidas and American Express in AFA's top sponsorship tier. But the interesting part isn't the jersey. According to Inside World Football, Argentina's coaching staff will use Gemini for three specific purposes: tactical analysis, injury prevention and decision support. In other words, Gemini now has a seat in meetings that previously belonged only to Scaloni and his assistants. Google has not publicly disclosed which specific Gemini tools have been integrated into AFA's workflow. What is clear is that they are using the World Cup to bring Gemini into the reality of professional football, and the results will be graded in public. What is Gemini actually doing in the dressing room? Argentina arrives at the 2026 World Cup as the reigning champion. Every decision Scaloni makes, from the squad list to the starting eleven, is scrutinized more closely than any other team, and that is precisely why Argentina has become the most ideal testing ground Google has ever had for Gemini in professional football, especially at a major tournament. Tactical analysis Gemini is used to process match data for both Argentina and their opponents, covering movement statistics, attacking patterns and defensive vulnerabilities. Instead of the coaching staff spending hours reviewing footage, AI synthesizes the data and generates tactical diagrams automatically, saving significant preparation time before each match. Injury prevention This is a problem every major team wants to solve, especially when Messi and several key players are at an age that requires careful management of training loads. Gemini analyzes biometric data and injury history to issue early warnings, helping the coaching staff adjust intensity before problems actually occur. That is part of the reason why, immediately after completing his hat-trick, Scaloni chose to substitute Messi off, prioritizing fitness and safety for the matches ahead. AI in injury prevention is nothing new. Premier League clubs have had Microsoft as a partner for similar purposes. What is different this time is that Gemini is integrated directly into the workflow of a national team competing at a major tournament, not just at club level. For fans: create Messi content, follow scores without unlocking your screen Alongside supporting the coaching staff, Gemini has also rolled out a range of features aimed at fans, and this is the side that hundreds of millions of people will actually experience. Gemini lets you create content about players directly Users can generate images, songs and digital content featuring Argentina players like Messi directly inside the Gemini app. The feature is designed to bring the World Cup experience closer to those who cannot attend matches in person. Real-time scores and automated daily briefings On Google Search, live match scores can be pinned to the lock screen and update in real time, with dedicated animations for goals and red cards, all without needing to unlock the phone. For paid Gemini users, the Scheduled Actions feature allows an automated daily football briefing to be set up, covering scores, news and fixtures, delivered at a chosen time without needing to prompt it each day. Match-day infrastructure Google has updated Street View at all 16 host stadiums and optimized routing on Waze for match days. Waze also surfaces live scores when the car is stopped at red lights, so drivers do not need to pick up their phones while on the move. The 2026 World Cup is the real test for AI in sport Google is not sponsoring Argentina alone. Gemini also appears on the kits of France, Morocco, Iraq, Turkey and the United States, while Pixel is the official phone of the French squad, which is also using Gemini for internal communications. This is clearly a comprehensive strategy from Google, not a one-off deal. What makes the 2026 World Cup particularly significant is that it will answer a question no lab environment can: what do users actually do with AI when a World Cup runs for six weeks across 104 matches? Features that run on initial novelty will fade after the group stage. Whatever users keep coming back to all the way through the final is the honest answer to where AI actually fits in everyday life, and Google knows it. Google's communications director for Latin America, Flor Sabatini, stated that the 2026 World Cup will mark a before and after in the history of football because of AI. It sounds like marketing, but the reality is that this is the first time a major AI model has been integrated into the preparation of the reigning world champions, right in the middle of the most-watched sporting event on the planet. The 2026 World Cup is Gemini's real test The most significant part of this entire story is not the Gemini logo on Messi's jersey. It is the fact that Argentina, still the most expected to win and the most scrutinized team, carrying the pressure of defending the title, has committed part of its preparation process to AI. If Argentina succeeds, Gemini will have a case study that no advertising budget can buy. If Argentina falls short and the coaching staff attributes any part of it to AI, the narrative will flip entirely. Either way, this is the first time AI has been held accountable on a stage that genuinely matters, not a benchmark, not a demo, but the World Cup. For AI users, what is worth watching is not just whether Argentina wins, but whether Gemini actually changes how a football team operates, or whether it turns out to be nothing more than a logo on a training kit that looks better than previous years.

Nam•

17 Jun, 2026

YC CEO's 6 forcing questions before starting any project

I'd heard a lot about the gstack repo from the CEO of Y Combinator, so I got curious and installed it to try. What surprised me most wasn't the polished workflows — it was the genuinely different mindset behind them. That mindset shows up in the very first command: /office-hours, with six questions that don't ask about code at all, only the things most people haven't thought through before they start building. What is gstack and why did Garry Tan build it gstack is an open-source toolkit by Garry Tan, CEO of Y Combinator, built primarily for Claude Code. The core idea: instead of using AI as a plain code writer, Garry Tan wanted to turn Claude into a small AI agent team, where each member handles a different role — from product direction and security review to testing and release. The entire workflow runs in an ordered loop: Think → Plan → Build → Review → Test → Ship → Reflect. More specifically, gstack splits Claude Code into 23 specialized roles, and the output of each step is automatically passed to the next — no manual handoff needed. Some of the standout commands: /office-hours 6 questions that force you to rethink your feature before writing a single line of code /plan-ceo-review checks whether you're overbuilding or underbuilding relative to what's actually needed /review catches serious bugs that standard automated checks miss /qa opens a real browser, performs real interactions, finds real bugs /cso runs an automated security audit against international standards /ship syncs, tests, pushes code and opens a pull request in a single command How effective is gstack? Garry Tan says his working speed in 2026 is roughly 810 times faster than in 2013, measured by lines of completed code per day (11,417 vs 14). In 60 days, he shipped 3 production services and over 40 features — all while running Y Combinator full-time. Andrej Karpathy, co-founder of OpenAI, confirmed a similar trend, sharing that he hasn't typed a single line of code himself since December 2025 thanks to AI agents. But among all those commands, /office-hours stands out for the opposite reason from the rest, it doesn't help you work faster and it helps you avoid building the wrong thing from the start. Why Garry Tan puts /office-hours first Garry Tan placed /office-hours at the top of the workflow based on a simple observation: most products fail not because of poor code, but because they build the wrong thing. Teams spend weeks on a feature nobody needs, or build the right feature for the wrong audience, or solve a problem users already handle better another way. The command has two modes: Startup mode for founders and people building real products with real users, and Builder mode for side projects, hackathons, and open source. This article focuses on Startup mode, where the 6 questions are most directly applicable. 6 questions that stop you from building the wrong thing These aren't 6 questions to answer quickly and move on. They're designed to make you think honestly, because the more truthful your answers, the more accurately Claude can match what you actually need — saving you a significant amount of time later. You can read the full original prompts at office-hours/SKILL.md.tmpl. Demand reality: Is there a real need? Original question: "Who specifically has this problem? How are they solving it today?" Not "users in general" or "the marketing team" — the goal is to name one real person, ideally by name, who is actively struggling with a specific problem. If you can't name someone like that, you don't yet understand what they actually need. Concrete example: Instead of "users want better task management," it should be: "Minh, a project manager at a 20-person company, copy-pastes between Notion and Google Sheets every Monday morning because the two tools don't sync." Apply this to your own situation accordingly. Status quo: What are they using instead? Original question: "What is their current workaround? How much better do you need to be for them to switch?" Everyone is already solving their problem somehow — whether with Excel, sticky notes, or a WhatsApp group. If their current solution is good enough, they have no reason to migrate their data and learn an entirely new platform. Your solution needs to be meaningfully better before they'll even consider switching. Desperate specificity: Who needs this badly enough? Original question: "Who needs a solution badly enough to use your ugly beta version today?" This is the question that separates nice-to-have from must-have. If you can't find anyone willing to use an incomplete, rough, buggy version right now, the problem you're solving isn't urgent enough. Real early users are people who need a solution badly enough to tolerate an unpolished product — as long as it's moving in the right direction. Narrowest wedge: What is the smallest possible piece? Original question: "What is the smallest thing you could launch tomorrow? Not the full vision — the smallest piece." Not the first full-featured version — something even smaller than that. This question typically cuts 80% of the scope people add because they think "might as well do it while I'm here." It's a trap many builders fall into, including myself. Launch the smallest meaningful piece first, listen to real users, then decide whether to expand. Common mistake: Many people confuse "smallest piece" with "first full-featured version." The narrowest wedge truly means one small thing that solves one specific problem for one specific group of users — nothing more. Observation and surprise: Have you watched real people use it? Original question: "Have you watched real people use your product? Did they use it in ways you didn't expect?" This question is best saved for the second iteration onward, once you have something to test. Rather than asking for feedback through messages or surveys, sit and watch directly — or review screen recordings. The most valuable insights usually don't come from what users say, but from what they do that you didn't design for, or what they skip that you thought was important. Note: If you're in your first iteration and don't have a product yet, you can skip this question and come back after launching the smallest piece in step 4. Future-fit: The 2 to 3 year view Original question: "In 2-3 years, will what you're building still be relevant — or is the trend moving against you?" This isn't about predicting the future precisely. It's about avoiding building something that's already fading. If the trend is making your problem less urgent over the next two years, that's a clear signal to reconsider from the start. That said, if your goal is to move fast and capture the market before big tech ships something similar, this question can reasonably be set aside. A real example: a simple idea completely flipped In the gstack documentation, Garry Tan walks through a practical example. You open /office-hours and say: "I want to build an app that summarizes my daily work calendar." Claude doesn't agree and start executing. Instead, it pushes back: what you just described isn't a calendar summary app — it's actually a full personal AI chief of staff. These are entirely different in scope, technical complexity, and user expectations. From that single opening description, /office-hours helps you see: 5 features you were describing without realizing it 4 assumptions that need to be validated before building 3 different implementation directions with varying levels of complexity 1 recommendation: launch the smallest piece first, treat the rest as a long-term roadmap All of this happens before you write a single line of code. The output is saved as a document that subsequent steps in the workflow automatically pick up and continue from. These 6 questions work even without gstack The 6 questions from /office-hours don't require Claude Code or a gstack installation. They're a way of thinking — the same framework YC partners use to evaluate startups — and you can apply them right now with any AI tool you already have. The difference when using them through gstack is that Claude won't let you give vague answers. It pushes for specifics and won't move forward until your response is grounded enough to be useful. That's why /office-hours tends to be the most uncomfortable command in the entire toolkit — not because it's difficult to use, but because it asks exactly what you've been avoiding. Try it today: Before starting your next project, paste these 6 questions into Claude, Gemini, or ChatGPT along with your idea. Ask it to go through each question one at a time and not let you skip any. The results are often more surprising than you'd expect — even for ideas you've already thought through carefully. gstack currently has over 117k stars on GitHub and is still growing. For me, the most valuable part isn't the technical commands like /review or /ship — it's /office-hours, because it's the only command in the entire toolkit that forces you to stop and think before doing anything else.

Nam•

27 Jun, 2026

How to control Codex from your phone with ChatGPT app

You're out and suddenly remember a small detail in your project that needs fixing — you don't have to open your laptop or remote desktop in. With the right connection set up, ChatGPT app on your phone can become a control panel for Codex, while your computer at home or the office keeps running the actual code. ChatGPT app doesn't run Codex on your phone The easiest thing to misunderstand is thinking Codex is running directly on your phone. In reality, your phone only sends prompts, replies, approvals and follow-up messages, while the actual working environment lives on your Mac or Windows machine running Codex. In other words, ChatGPT app is the remote controller, and the host machine is where your repo, terminal, credentials, plugins, MCP servers and other tools actually live. This makes complete sense because codebases typically live on your development machine, not your phone. When you send a request like fixing a TypeScript error, running tests or checking a diff, Codex processes it inside the selected project on the host and sends results back for you to review. If you want to understand the foundation before using remote access, check out What is Codex and how to use Codex to get a clear picture of where this tool fits in your workflow. What do you need before connecting ChatGPT app to Codex? According to the latest Codex documentation from OpenAI, ChatGPT app supports controlling Codex on both macOS and Windows, though Linux is not supported yet. Notably, this feature works with all ChatGPT account types, including Free and Go — no paid plan required. You only need to make sure you're signed into the same account or workspace on both devices: ChatGPT mobile (latest version on iOS or Android) and Codex (latest version on your host machine, online and running). Your host machine must stay on and Codex must keep running for the entire time you're controlling it remotely. If the machine goes to sleep, loses its connection or Codex is closed, the connection from your phone drops immediately and any tasks in progress may be interrupted. What's worth noting is that the entire setup process starts from Codex App on the host machine and is surprisingly simple — just scan a QR code and you're done. Inside Codex App, select the mobile setup option in the sidebar, scan the QR code with your phone, then complete the confirmation in ChatGPT app. For enterprise workspaces, an admin may need to enable Remote Control permissions before you can connect. This QR code grants control over your computer, so keep it private and never share it with anyone to avoid unauthorized access to your machine. To summarize, connecting ChatGPT app to Codex is straightforward: Host machine must be online and running Codex ChatGPT app and Codex must be signed into the same account or workspace Generate the QR code in Codex on the host and complete setup on your phone MFA, SSO or passkey requirements may still apply depending on your workspace What can you do once connected? Once the host appears in Codex on your phone, you can start a new thread inside a project on the host or pick up an existing one. This is where the experience becomes genuinely useful: you can send follow-ups, answer Codex's questions, approve commands, view output, check diffs, review test results and even receive notifications when a task finishes or needs your attention. A real example: you're at a coffee shop and remember the login form has a validation bug. You open ChatGPT app, select the connected host, and ask Codex to check the auth flow, fix the email validation error and run the related tests. Codex works directly on the repo sitting on your host machine, while you review the results, approve actions when needed and decide whether to request further changes. This is also why people are starting to think of Codex and other AI-powered IDEs as a colleague working inside a real environment, not just a code suggestion tool anymore. Its strength lies in reading files, running commands, editing code and maintaining context across multiple rounds of back-and-forth. Limitations to keep in mind when using Codex from your phone Remote control depends entirely on the host machine — if your computer goes to sleep, loses its connection, closes Codex or gets signed out of the workspace, your phone loses its working environment immediately. That said, if Codex is mid-task when the connection drops, it will continue running on the host and notify you once your phone reconnects, so there's less to worry about if your phone suddenly loses signal during a running task. One more thing to note: on Windows, tasks using Computer Use require an appropriate foreground session, so this setup is not a complete replacement for sitting directly in front of your machine. It also helps to draw a clear line between handing off a focused task and reviewing large changes. Your phone works well for small bugs, running tests, quick questions about a specific file, reviewing short tasks or checking task status. However, anything requiring a high level of attention should still be reviewed on a larger screen to avoid missing details. How to use it effectively in practice The most effective approach is to hand off tasks with a clear scope and specific expected outcomes. Instead of saying "fix the login", describe exactly where the error occurs, what the expected behavior should be after the fix, which tests to run and which parts of the codebase to leave untouched. Codex performs better when it knows the boundaries of a task, especially since remote mobile means each feedback loop takes longer than when you're sitting right at your machine. A clean working rhythm might look like this: describe the task in detail whether small or medium-sized, ask Codex to read the relevant files, let it propose a solution, only approve when necessary and wait for the result report. Once you get used to this rhythm, you'll find that idle time outside can handle real work — while keeping the final decision firmly in your hands. Compared to Claude Code Remote and Telegram bot There are many ways to control an AI coding agent from your phone, though the three most common approaches each serve a different need. Criteria ChatGPT app + Codex Claude Code Remote Telegram + Codex Natural conversation ✅ Excellent ✅ Good ❌ Requires exact syntax Granular control Moderate Highest Low Connection stability Stable Stable Frequent drops Mobile UI Well optimized Not fully optimized Uses existing Telegram app Initial setup Easy, scan QR Easy Requires manual bot configuration Computer must stay on ✅ Required ✅ Required ✅ Required Claude Code Remote Control offers the strongest level of control — you get direct terminal output, can intervene mid-task and generally feel much closer to what the agent is doing. That said, the UI on small phone screens isn't fully optimized yet, and some interactions are still difficult to perform without a physical keyboard. Telegram bot has the advantage of not requiring a separate app and is easy to get started with, but the real-world experience has clear limits: it's prone to slowdowns, occasional silent disconnections mid-task, and because it lacks genuine AI context, anything slightly more complex than a simple command quickly falls apart — forcing you to type precise instructions rather than describe what you need naturally. ChatGPT app + Codex sits at the best balance point for most users — smooth enough, smart enough, quick to set up with a QR scan and no new syntax to learn before you can get to work. Connecting ChatGPT app to Codex doesn't turn your phone into a development machine — it turns your phone into a control surface for a development machine that's already ready to work. As long as the host stays on, permissions are configured correctly and the task is scoped tightly enough, this is the most practical way to handle real coding work when you're away from your laptop.

Nam•

22 Jun, 2026

What Is Hermes Agent? Nous Research's Self-Learning AI

Learning more makes you better, a principle long assumed to apply only to humans, turns out to hold true for Hermes Agent too, an open-source AI agent from Nous Research. Every time you work with it, Hermes Agent doesn't forget, it remembers, understands you more deeply, and gets better with each session, thanks to a memory system that can recall everything about you even after the machine has been off for a week. What Is Hermes Agent? Hermes Agent is an open-source AI agent developed and released under the MIT license by Nous Research, the lab behind the Hermes, Nomos, and Psyche model lines. Unlike Antigravity or Codex, which depend on an IDE environment, or ordinary chatbots that ultimately remain a thin wrapper calling a single API, Hermes Agent is built to run continuously on a user's own infrastructure, from a cheap VPS to a GPU cluster or serverless infrastructure, and it operates in a way fairly similar to OpenClaw. The core difference in Hermes Agent lies in how it manages long-term memory and converts experience into real skills. Instead of merely storing raw information or passively remembering preferences the way AI like Gemini or Claude do, Hermes runs a closed "learning loop," meaning that after every work session, it actively distills the process into new tools it can use the next time. This system is run by a background "Curator" agent that automatically scores, prunes, and merges accumulated knowledge, combined with FTS5 search technology that retrieves old memories roughly 4,500 times faster without spending any tokens. As a result, Hermes doesn't just respond and forget, it genuinely becomes a collaborator that grows more knowledgeable and capable over time. Four Features That Set Hermes Agent Apart Nous Research doesn't call Hermes Agent a chatbot or a copilot, it positions it as an agent with a built-in learning loop. The four feature groups below explain why that label isn't just marketing. Memory That Persists Across Sessions The biggest weakness of most AI today is that memory only stores raw chat text rather than how work actually gets done. Hermes Agent addresses this through three combined mechanisms: Fast retrieval: Uses FTS5 full-text search to pull up old memories roughly 4,500 times faster than conventional search, without spending extra tokens the way Gemini or Cowork do. User understanding: Integrates Honcho's dialectical user-modeling approach, helping the agent understand preferences, habits, and personal context in depth across thousands of sessions. Continuity: The agent picks up work exactly where you left off, even if that was a project from weeks earlier. Self-Generating and Self-Improving Skills This is the feature that makes Hermes Agent behave like a collaborator that accumulates experience, rather than just a tool that answers on request: Learning from real use: After completing complex tasks, Hermes Agent distills the process into new skills and stores them in a library to be reused automatically next time. Open agentskills.io standard: These skills follow an open standard, so they can be packaged, shared, and reused across different AI systems without being rewritten from scratch. The Curator mechanism: A background administrative agent periodically scores, prunes, and merges duplicate skills, which keeps the skill library from bloating and becoming disorganized over time. Present on More Than 23 Messaging Platforms Hermes Agent isn't confined to a computer, it integrates directly into the messaging channels people already use on their phones every day: Multiple channels, one brain: You can command Hermes Agent through Telegram, Discord, Slack, WhatsApp, Signal, email, or SMS. Context retained: Whether you message via Telegram in the morning or switch to Discord at night, the agent keeps a single thread of memory, never fragmented by channel. Multimodal interaction: Supports sending voice messages, images, and video, along with the ability to analyze multimodal content. Flexible Runtime Infrastructure Hermes Agent supports six backend types for executing commands: local machine, Docker, SSH, Daytona, Singularity, and Modal. With Daytona and Modal, the environment can hibernate when idle and cost almost nothing while waiting, waking up only when there's work to process. This is why Nous Research describes Hermes Agent as an always-on agent that doesn't require users to keep a server running 24/7 at high cost year-round. Hermes Agent can be installed with a single curl command, supporting Linux, macOS, and Windows via WSL2, or, as of June 5, 2026 with version v0.16.0 "The Surface Release," through an official Native Desktop app for Windows, macOS, and Linux with a fully polished GUI, making it accessible to everyday users without needing a terminal. Built-In Toolset and Limitations to Know 40-Plus Built-In Tools, From Web Search to Schedule Automation Hermes Agent ships with more than 40 built-in tools, including web search, browser actions, file handling, and Python script execution via RPC to run sub-tasks without consuming the main agent's context window. A natural-language scheduling system lets you set recurring tasks like daily reports or data backups, then leaves the agent to run them without being reminded. For tasks that need full isolation, Hermes Agent also supports sub-agents with their own conversation, terminal, and scripts, allowing multiple jobs to run in parallel without diluting the main memory. Challenges and Security Considerations Despite rapid updates, Hermes Agent still has a few points users should keep in mind before deploying it: Stability of the self-learning mechanism: The ability to self-improve skills boosts success rates, with a Tencent Cloud report recording gains of up to 52% along with token savings of up to 61%. However, since this is a self-evolving mechanism, real-world effectiveness still depends on the underlying model chosen and still requires human oversight rather than full trust. Risk from high-level permissions, with security responsibility falling on the user: Hermes Agent can intervene deeply in a system (excessive agency), so connecting it directly to multiple messaging platforms requires users to manage their own API keys and set up guardrails. Unlike closed AI services, Hermes Agent hands full control over to the user, which means the user also bears greater responsibility for configuring access permissions to avoid information leaks. Why Is Hermes Agent Growing So Fast? Hermes Agent's growth could be attributed to Nous Research's marketing, but in our view it comes down to three main factors. A Frictionless Migration Path From OpenClaw Recognizing OpenClaw's large user base, Nous Research built a migration tool that lets users carry over their persona, API keys, the entire skill set, and memory to Hermes Agent with a single command, without losing old data and, of course, without having to reconfigure anything from scratch. If you're currently using OpenClaw and want to try Hermes Agent without losing your old data, look for the hermes claw migrate migration tool built into Hermes Agent before considering a fresh install. Betting on a Closed Learning Loop Instead of a Feature Race While many other agents compete on the number of tools they offer, Hermes Agent positions itself as a self-evolving entity, one that distills experience into new skills and retains long-term memory to understand users more deeply over time. This approach creates lasting value, and the community has already put it to use for projects such as automating large-scale content production with high consistency across many sessions. A Role as a Training Data Engine Beyond serving as a personal assistant, Hermes Agent also functions as a capable research tool. It can generate thousands of parallel tool-calling trajectories and compress them into training data for other AI models. By turning the agent's real-world experience into training data, Hermes becomes a platform that developers building the next generation of autonomous AI can't easily do without. How Is Hermes Agent Different From an Agent Harness? People new to the space often confuse Hermes Agent with the concept of an agent harness, which is the framework that decides how a model calls tools, handles the reasoning loop, and coordinates execution steps internally. If a harness is the engine and chassis that determine how a car drives, then Hermes Agent is like a car that already has that engine installed, plus seats, a navigation system, and the driver's own trip memory. In other words, a harness is the technical architecture layer underneath, while Hermes Agent is a complete end-user product that already packages memory, a skill system, communication channels, and a choice of runtime infrastructure. A developer can build their own harness to control every small detail, but most users don't need to go that deep, they just need an agent that runs right away and gets smarter through use. For a closer look at this underlying architecture layer, read more at What Is Agent Harness? The Framework That Makes AI Work Efficiently, which explains in detail how this type of framework operates. Is Hermes Agent Worth Trying Right Now? Being fully open source, collecting no user data, and supporting complete self-hosting, Hermes Agent is one of the few agents today that lets users keep full control over their own data while still getting a continuous assistant experience with real memory, not the simulated memory that only exists within a single chat. After v0.16.0, the biggest technical barrier for users unfamiliar with terminals has largely been removed, as the native desktop app for Windows, macOS, and Linux has fully replaced the pure CLI approach used before. What's left to judge about Hermes Agent isn't whether it runs, but what it learns after a few real weeks of use. The fastest way to find out is to install the desktop app or run the CLI on a cheap VPS, connect it to a familiar messaging channel like Telegram, then watch what skills the agent forms on its own from how you use it every day. That's also the groundwork for comparing Hermes Agent with other options on the market, from Agent Harness to OpenClaw and Claude Cowork, in the next part of this series.

Nam•

19 Jun, 2026

Quick Summary

Sự nâng cấp so với Gemini 2.5 như thế nào

Càn quét các bảng xếp hạng

So sánh hiệu suất suy luận cấp độ tiến sĩ

Sức mạnh đa phương thức (Multimodality)

Tình huống sử dụng thực tế như thế nào

Gemini 3 Deep Think có chế độ suy nghĩ nâng cao không

Khả năng cho nhà phát triển và tốc độ triển khai

Khả năng coding Gemini 3 tốt như thế nào

Bao giờ có thể sử dụng Gemini 3

Discussion (0)

Related Articles

Gemini powers Argentina and Messi at World Cup 2026

YC CEO's 6 forcing questions before starting any project

How to control Codex from your phone with ChatGPT app

What Is Hermes Agent? Nous Research's Self-Learning AI