Google tuyên bố kế hoạch mở rộng hạ tầng AI gấp 1000 lần để giữ vững thế thống trị

Published on 26 November, 2025

Quick Summary

Google đặt mục tiêu tăng công suất AI lên 1.000 lần trong 4-5 năm tới, bằng cách tăng gấp đôi năng lực phục vụ AI mỗi sáu tháng để đáp ứng nhu cầu bùng nổ của "kỷ nguyên suy luận". Để đạt được điều này với chi phí và điện năng không đổi, Google tập trung vào thiết kế tích hợp phần mềm với phần cứng tự sản xuất như chip TPU Ironwood và CPU Axion. Công ty cũng đối mặt với thách thức lớn về làm mát, năng lượng (áp dụng làm mát bằng chất lỏng, hệ thống điện 48V/400V DC, đầu tư vào năng lượng hạt nhân) và rủi ro bong bóng AI. Dù vậy, Google tin rằng không đầu tư đủ là rủi ro lớn nhất và đang thách thức sự thống trị của Nvidia bằng cách phát triển giải pháp AI chuyên biệt, hiệu quả hơn, với việc Meta Platforms đang cân nhắc sử dụng TPU của họ.

Google đã công bố một yêu cầu nội bộ đầy tham vọng nhằm tăng công suất AI lên gấp 1.000 lần trong vòng 4 đến 5 năm tới. Động thái này diễn ra trong bối cảnh cuộc đua AI toàn cầu đang nóng lên và đòi hỏi các công ty công nghệ phải đầu tư khổng lồ vào cơ sở hạ tầng tính toán, bất chấp những lo ngại về bong bóng AI.

Phó chủ tịch phụ trách cơ sở hạ tầng AI của Google Cloud, ông Amin Vahdat đã trình bày lộ trình này tại một cuộc họp toàn công ty vào đầu tháng 11, nhấn mạnh rằng Google phải tăng gấp đôi năng lực phục vụ AI sau mỗi sáu tháng để đáp ứng nhu cầu bùng nổ.

Kỷ nguyên suy luận là gì và tại sao nó lại quan trọng với các mô hình AI

Sự phát triển của AI đang bùng nổ chưa từng thấy, lý do là vì chúng ta đã chuyển từ việc chủ yếu huấn luyện các mô hình AI sang một giai đoạn mới gọi là kỷ nguyên suy luận (Inference Era) tức là để AI tự suy luận, tự quyết định.

Trước đây, giai đoạn tốn kém và ngốn tài nguyên nhất là lúc AI đang học hỏi. Nhưng giờ đây thì khác, đối với những mô hình siêu xịn mới nhất như Gemini 3 Pro lại cần một lượng sức mạnh máy tính khổng lồ và liên tục để làm những việc như: suy nghĩ, lập luận và viết code.

Ông Vahdat đã cảnh báo thẳng thắn rằng: Hiện tại, cuộc đua xây dựng cơ sở hạ tầng cho AI mới chính là phần quan trọng nhất và đắt đỏ nhất của toàn bộ cuộc chơi AI này!

Ưu thế của Google đi kèm với thách thức

Để đạt được mức tăng trưởng hạ tầng AI gấp 1.000 lần mà không làm chi phí vượt tầm kiểm soát, Google tiếp tục đặt cược vào hiệu suất và hiệu quả năng lượng.

Tối ưu hóa hiệu suất và chi phí

Google đã đặt ra mục tiêu với quá nhiều thách thức: cần cung cấp năng lực tính toán, lưu trữ và kết nối gấp 1.000 lần so với hiện nay, nhưng phải duy trì chi phí và mức tiêu thụ điện năng ở mức tương đương.

Để thực hiện điều này, Google áp dụng triết lý thiết kế đồng bộ rất giống của Nvidia. Đây là việc tích hợp chặt chẽ phần mềm, thuật toán (do DeepMind tự phát triển) với kiến trúc phần cứng “cây nhà lá vườn” của Google đó là TPU Ironwood và CPU Axion.

Vai trò của TPU Ironwood và CPU Axion là gì

Chip TPU Ironwood thế hệ thứ 7 (ra mắt vào tháng 4/2025) là trung tâm của chiến lược mở rộng này.

Ironwood được thiết kế cho mục đích suy luận mô hình ngôn ngữ lớn (LLM inferencing).
Google tuyên bố Ironwood cung cấp hiệu suất đỉnh cao gấp 10 lần so với TPU v5p ra mắt 2018 và hiệu suất trên mỗi watt gấp 2 lần so với thế hệ Trillium trước đó.
Mỗi chip TPU v7 Ironwood được làm mát bằng chất lỏng có khả năng đạt 4.6 petaFLOPS (FP8 dense). Để hiểu rõ, chúng ta so sánh ngay với Blackwell B200 mới nhất của Nvidia cũng chỉ đạt 4.5 petaFLOPS (FP8 dense).

Bên cạnh đó, Google sử dụng CPU tự phát triển Axion (dựa trên Arm). Các khối lượng công việc đa năng đang được chuyển sang các bộ xử lý hiệu quả hơn này để giải phóng năng lượng và không gian nhiệt cho các TPU ngốn điện để phục vụ tác vụ AI chuyên dụng.

Google tự sản xuất TPU và CPU — Google tự phát triên a-z

Thách thức hạ tầng và năng lượng

Việc tăng công suất tính toán lên mức siêu lớn tất nhiên cũng phải đòi hỏi phải vượt qua các rào cản vật lý lớn về điện năng và làm mát.

Hiện tại, các con chip AI đang mạnh mẽ đến mức chúng trở thành những "lò sưởi tí hon". Dù kích thước chip rất nhỏ, nhưng nhiệt độ và sức nóng mà chúng tạo ra lại cực kỳ khủng khiếp.

Để giải quyết vấn đề này, Google đang thực hiện hai giải pháp chính:

Làm mát bằng chất lỏng: Họ đã chuyển sang dùng nước hoặc chất lỏng chuyên dụng để làm mát trực tiếp các chip. Cách này hiệu quả hơn rất nhiều so với quạt gió.
Hệ thống điện 48V: Google đang triển khai hệ thống phân phối điện 48V. Đây là một giải pháp giúp truyền tải điện hiệu quả hơn và giảm thiểu việc lãng phí điện năng bị biến thành nhiệt.

Trong tương lai, khi các tủ máy (rack) chứa chip mạnh đến mức cần công suất hàng trăm kilowatt, Google đang nghiên cứu đến một bước đột phá lớn hơn: chuyển sang dùng nguồn điện một chiều DC 400 V. Điều này sẽ giúp họ khai thác toàn bộ sức mạnh của các hệ thống học máy khổng lồ mà không sợ bị quá tải về điện.

Cam kết môi trường và khủng hoảng năng lượng

Theo Alphabet (công ty mẹ của Google) luôn đặt mục tiêu Net Zero (phát thải ròng bằng 0) vào năm 2030 giống như chính phủ Việt Nam ta. Tuy nhiên, nhu cầu năng lượng cho mảng AI đang bị cảnh báo là rất lớn và có thể ảnh hưởng đến các mục tiêu khí hậu của Alphabet.

Để giải quyết tình trạng thiếu hụt năng lượng trên toàn cầu, Google đang tìm kiếm các nguồn cung cấp năng lượng tại chỗ đáng tin cậy, sạch sẽ và có chi phí thấp. Google đã công bố đầu tư vào năng lượng hạt nhân (Kyros), sử dụng các lò phản ứng mô-đun nhỏ (SMRs) 500 megawatt.

Bong bóng và rủi ro tài chính của AI sẽ diễn biến thế nào

Mặc dù Google đang dốc hết tiền đầu tư vào công nghệ AI nhưng trên thị trường tài chính, ai cũng đang lo lắng về một bong bóng AI sắp vỡ.

Bản thân Sundar Pichai (CEO Alphabet) cũng phải thẳng thắn thừa nhận: "Có những yếu tố hơi phi lý trong cách thị trường đang định giá các công ty AI hiện nay." Bằng chứng là Alphabet đã nâng dự báo chi tiêu cho xây dựng cơ sở hạ tầng (CapEx) cho năm 2025 lên đến 93 tỷ USD một con số khổng lồ!

Tuy nhiên, ông Pichai có một lập luận rất chắc chắn” Rủi ro lớn nhất không phải là đầu tư quá nhiều, mà là không đầu tư đủ.”

Ông đưa ra ví dụ: Mảng Google Cloud đang tăng trưởng rất ấn tượng, nhưng đáng lẽ doanh thu còn phải cao hơn nữa nếu như họ có đủ năng lực tính toán để phục vụ khách hàng. Nói cách khác, Google chấp nhận rủi ro đầu tư lớn để không bỏ lỡ cơ hội kiếm tiền khủng trong tương lai.

Google có đang thách thức sự thống trị của Nvidia

Google đang tăng tốc đầu tư vào hệ thống TPU (chip xử lý AI riêng của họ) và theo đuổi chiến lược tự làm từ A đến Z (từ thiết kế đến sản xuất chip). Điều này đang tạo ra một giải pháp thay thế rất tiềm năng cho chip GPU của Nvidia vị vua đang thống trị thị trường hạ tầng AI hiện nay.

Chip TPU của Google là một loại mạch điện tử (ASIC) được sinh ra chỉ để làm một việc: tính toán cho AI. Nó không giống như GPU của Nvidia.

GPU của Nvidia giống như một vận động viên đa năng, rất linh hoạt và làm được nhiều việc hơn.
TPU của Google giống như một vận động viên chuyên biệt, có thể làm một số tác vụ huấn luyện và suy luận AI khối lượng lớn hiệu quả hơn và ít tốn điện hơn đối thủ.

Canh bạc đặt cược vào TPU của Google đang bắt đầu có hiệu quả khi Meta Platforms đang đàm phán để sử dụng TPU của Google với mục đích là đa dạng hóa nhà cung cấp và giảm bớt sự phụ thuộc vào Nvidia. Dự kiến, Meta có thể bắt đầu thuê năng lực TPU từ năm 2026 và mua chip số lượng lớn từ năm 2027.

Tóm lại, kế hoạch tăng tốc hạ tầng AI lên gấp 1.000 lần của Google không chỉ là một mục tiêu về số lượng mà là là sự thay đổi về cách thiết kế hệ thống.

Google đang biến trung tâm dữ liệu thành một cỗ máy thống nhất và cực kỳ hiệu quả. Họ tập trung vào việc đồng bộ thiết kế giúp phần cứng bắt tay với phần mềm để cùng xử lý tác vụ cùng tiết kiệm điện năng, hơn nữa là dùng chip nhà làm giống như điều Apple đã và đang làm để bảo đảm vị thế dẫn đầu trong cuộc đua với tốc độ chóng mặt!

Discussion (0)

No comments yet. Be the first!

Gemini powers Argentina and Messi at World Cup 2026

Gemini has won big in the most literal sense, right as Messi scored his first hat-trick at the 2026 World Cup, leading Argentina to a crushing 3-0 victory over Algeria and equaling Miroslav Klose's record of 16 World Cup goals. That historic moment became the perfect launchpad for Gemini. Back in March 2026, Google and the Argentine Football Association (AFA) made a bold decision: rather than simply printing a logo on training kits, they signed a deal for the AI to actively support tactical preparation and professional decision-making. That bet has now proven to be the right call. From training kit to the tactical meeting room The agreement between AFA and Google was unveiled at Times Square, New York, a venue deliberately chosen to capture global media attention. The Gemini logo appears across all training apparel for Argentina's men's, women's and youth squads, sitting alongside Adidas and American Express in AFA's top sponsorship tier. But the interesting part isn't the jersey. According to Inside World Football, Argentina's coaching staff will use Gemini for three specific purposes: tactical analysis, injury prevention and decision support. In other words, Gemini now has a seat in meetings that previously belonged only to Scaloni and his assistants. Google has not publicly disclosed which specific Gemini tools have been integrated into AFA's workflow. What is clear is that they are using the World Cup to bring Gemini into the reality of professional football, and the results will be graded in public. What is Gemini actually doing in the dressing room? Argentina arrives at the 2026 World Cup as the reigning champion. Every decision Scaloni makes, from the squad list to the starting eleven, is scrutinized more closely than any other team, and that is precisely why Argentina has become the most ideal testing ground Google has ever had for Gemini in professional football, especially at a major tournament. Tactical analysis Gemini is used to process match data for both Argentina and their opponents, covering movement statistics, attacking patterns and defensive vulnerabilities. Instead of the coaching staff spending hours reviewing footage, AI synthesizes the data and generates tactical diagrams automatically, saving significant preparation time before each match. Injury prevention This is a problem every major team wants to solve, especially when Messi and several key players are at an age that requires careful management of training loads. Gemini analyzes biometric data and injury history to issue early warnings, helping the coaching staff adjust intensity before problems actually occur. That is part of the reason why, immediately after completing his hat-trick, Scaloni chose to substitute Messi off, prioritizing fitness and safety for the matches ahead. AI in injury prevention is nothing new. Premier League clubs have had Microsoft as a partner for similar purposes. What is different this time is that Gemini is integrated directly into the workflow of a national team competing at a major tournament, not just at club level. For fans: create Messi content, follow scores without unlocking your screen Alongside supporting the coaching staff, Gemini has also rolled out a range of features aimed at fans, and this is the side that hundreds of millions of people will actually experience. Gemini lets you create content about players directly Users can generate images, songs and digital content featuring Argentina players like Messi directly inside the Gemini app. The feature is designed to bring the World Cup experience closer to those who cannot attend matches in person. Real-time scores and automated daily briefings On Google Search, live match scores can be pinned to the lock screen and update in real time, with dedicated animations for goals and red cards, all without needing to unlock the phone. For paid Gemini users, the Scheduled Actions feature allows an automated daily football briefing to be set up, covering scores, news and fixtures, delivered at a chosen time without needing to prompt it each day. Match-day infrastructure Google has updated Street View at all 16 host stadiums and optimized routing on Waze for match days. Waze also surfaces live scores when the car is stopped at red lights, so drivers do not need to pick up their phones while on the move. The 2026 World Cup is the real test for AI in sport Google is not sponsoring Argentina alone. Gemini also appears on the kits of France, Morocco, Iraq, Turkey and the United States, while Pixel is the official phone of the French squad, which is also using Gemini for internal communications. This is clearly a comprehensive strategy from Google, not a one-off deal. What makes the 2026 World Cup particularly significant is that it will answer a question no lab environment can: what do users actually do with AI when a World Cup runs for six weeks across 104 matches? Features that run on initial novelty will fade after the group stage. Whatever users keep coming back to all the way through the final is the honest answer to where AI actually fits in everyday life, and Google knows it. Google's communications director for Latin America, Flor Sabatini, stated that the 2026 World Cup will mark a before and after in the history of football because of AI. It sounds like marketing, but the reality is that this is the first time a major AI model has been integrated into the preparation of the reigning world champions, right in the middle of the most-watched sporting event on the planet. The 2026 World Cup is Gemini's real test The most significant part of this entire story is not the Gemini logo on Messi's jersey. It is the fact that Argentina, still the most expected to win and the most scrutinized team, carrying the pressure of defending the title, has committed part of its preparation process to AI. If Argentina succeeds, Gemini will have a case study that no advertising budget can buy. If Argentina falls short and the coaching staff attributes any part of it to AI, the narrative will flip entirely. Either way, this is the first time AI has been held accountable on a stage that genuinely matters, not a benchmark, not a demo, but the World Cup. For AI users, what is worth watching is not just whether Argentina wins, but whether Gemini actually changes how a football team operates, or whether it turns out to be nothing more than a logo on a training kit that looks better than previous years.

Nam•

17 Jun, 2026

YC CEO's 6 forcing questions before starting any project

I'd heard a lot about the gstack repo from the CEO of Y Combinator, so I got curious and installed it to try. What surprised me most wasn't the polished workflows — it was the genuinely different mindset behind them. That mindset shows up in the very first command: /office-hours, with six questions that don't ask about code at all, only the things most people haven't thought through before they start building. What is gstack and why did Garry Tan build it gstack is an open-source toolkit by Garry Tan, CEO of Y Combinator, built primarily for Claude Code. The core idea: instead of using AI as a plain code writer, Garry Tan wanted to turn Claude into a small AI agent team, where each member handles a different role — from product direction and security review to testing and release. The entire workflow runs in an ordered loop: Think → Plan → Build → Review → Test → Ship → Reflect. More specifically, gstack splits Claude Code into 23 specialized roles, and the output of each step is automatically passed to the next — no manual handoff needed. Some of the standout commands: /office-hours 6 questions that force you to rethink your feature before writing a single line of code /plan-ceo-review checks whether you're overbuilding or underbuilding relative to what's actually needed /review catches serious bugs that standard automated checks miss /qa opens a real browser, performs real interactions, finds real bugs /cso runs an automated security audit against international standards /ship syncs, tests, pushes code and opens a pull request in a single command How effective is gstack? Garry Tan says his working speed in 2026 is roughly 810 times faster than in 2013, measured by lines of completed code per day (11,417 vs 14). In 60 days, he shipped 3 production services and over 40 features — all while running Y Combinator full-time. Andrej Karpathy, co-founder of OpenAI, confirmed a similar trend, sharing that he hasn't typed a single line of code himself since December 2025 thanks to AI agents. But among all those commands, /office-hours stands out for the opposite reason from the rest, it doesn't help you work faster and it helps you avoid building the wrong thing from the start. Why Garry Tan puts /office-hours first Garry Tan placed /office-hours at the top of the workflow based on a simple observation: most products fail not because of poor code, but because they build the wrong thing. Teams spend weeks on a feature nobody needs, or build the right feature for the wrong audience, or solve a problem users already handle better another way. The command has two modes: Startup mode for founders and people building real products with real users, and Builder mode for side projects, hackathons, and open source. This article focuses on Startup mode, where the 6 questions are most directly applicable. 6 questions that stop you from building the wrong thing These aren't 6 questions to answer quickly and move on. They're designed to make you think honestly, because the more truthful your answers, the more accurately Claude can match what you actually need — saving you a significant amount of time later. You can read the full original prompts at office-hours/SKILL.md.tmpl. Demand reality: Is there a real need? Original question: "Who specifically has this problem? How are they solving it today?" Not "users in general" or "the marketing team" — the goal is to name one real person, ideally by name, who is actively struggling with a specific problem. If you can't name someone like that, you don't yet understand what they actually need. Concrete example: Instead of "users want better task management," it should be: "Minh, a project manager at a 20-person company, copy-pastes between Notion and Google Sheets every Monday morning because the two tools don't sync." Apply this to your own situation accordingly. Status quo: What are they using instead? Original question: "What is their current workaround? How much better do you need to be for them to switch?" Everyone is already solving their problem somehow — whether with Excel, sticky notes, or a WhatsApp group. If their current solution is good enough, they have no reason to migrate their data and learn an entirely new platform. Your solution needs to be meaningfully better before they'll even consider switching. Desperate specificity: Who needs this badly enough? Original question: "Who needs a solution badly enough to use your ugly beta version today?" This is the question that separates nice-to-have from must-have. If you can't find anyone willing to use an incomplete, rough, buggy version right now, the problem you're solving isn't urgent enough. Real early users are people who need a solution badly enough to tolerate an unpolished product — as long as it's moving in the right direction. Narrowest wedge: What is the smallest possible piece? Original question: "What is the smallest thing you could launch tomorrow? Not the full vision — the smallest piece." Not the first full-featured version — something even smaller than that. This question typically cuts 80% of the scope people add because they think "might as well do it while I'm here." It's a trap many builders fall into, including myself. Launch the smallest meaningful piece first, listen to real users, then decide whether to expand. Common mistake: Many people confuse "smallest piece" with "first full-featured version." The narrowest wedge truly means one small thing that solves one specific problem for one specific group of users — nothing more. Observation and surprise: Have you watched real people use it? Original question: "Have you watched real people use your product? Did they use it in ways you didn't expect?" This question is best saved for the second iteration onward, once you have something to test. Rather than asking for feedback through messages or surveys, sit and watch directly — or review screen recordings. The most valuable insights usually don't come from what users say, but from what they do that you didn't design for, or what they skip that you thought was important. Note: If you're in your first iteration and don't have a product yet, you can skip this question and come back after launching the smallest piece in step 4. Future-fit: The 2 to 3 year view Original question: "In 2-3 years, will what you're building still be relevant — or is the trend moving against you?" This isn't about predicting the future precisely. It's about avoiding building something that's already fading. If the trend is making your problem less urgent over the next two years, that's a clear signal to reconsider from the start. That said, if your goal is to move fast and capture the market before big tech ships something similar, this question can reasonably be set aside. A real example: a simple idea completely flipped In the gstack documentation, Garry Tan walks through a practical example. You open /office-hours and say: "I want to build an app that summarizes my daily work calendar." Claude doesn't agree and start executing. Instead, it pushes back: what you just described isn't a calendar summary app — it's actually a full personal AI chief of staff. These are entirely different in scope, technical complexity, and user expectations. From that single opening description, /office-hours helps you see: 5 features you were describing without realizing it 4 assumptions that need to be validated before building 3 different implementation directions with varying levels of complexity 1 recommendation: launch the smallest piece first, treat the rest as a long-term roadmap All of this happens before you write a single line of code. The output is saved as a document that subsequent steps in the workflow automatically pick up and continue from. These 6 questions work even without gstack The 6 questions from /office-hours don't require Claude Code or a gstack installation. They're a way of thinking — the same framework YC partners use to evaluate startups — and you can apply them right now with any AI tool you already have. The difference when using them through gstack is that Claude won't let you give vague answers. It pushes for specifics and won't move forward until your response is grounded enough to be useful. That's why /office-hours tends to be the most uncomfortable command in the entire toolkit — not because it's difficult to use, but because it asks exactly what you've been avoiding. Try it today: Before starting your next project, paste these 6 questions into Claude, Gemini, or ChatGPT along with your idea. Ask it to go through each question one at a time and not let you skip any. The results are often more surprising than you'd expect — even for ideas you've already thought through carefully. gstack currently has over 117k stars on GitHub and is still growing. For me, the most valuable part isn't the technical commands like /review or /ship — it's /office-hours, because it's the only command in the entire toolkit that forces you to stop and think before doing anything else.

Nam•

27 Jun, 2026

How to control Codex from your phone with ChatGPT app

You're out and suddenly remember a small detail in your project that needs fixing — you don't have to open your laptop or remote desktop in. With the right connection set up, ChatGPT app on your phone can become a control panel for Codex, while your computer at home or the office keeps running the actual code. ChatGPT app doesn't run Codex on your phone The easiest thing to misunderstand is thinking Codex is running directly on your phone. In reality, your phone only sends prompts, replies, approvals and follow-up messages, while the actual working environment lives on your Mac or Windows machine running Codex. In other words, ChatGPT app is the remote controller, and the host machine is where your repo, terminal, credentials, plugins, MCP servers and other tools actually live. This makes complete sense because codebases typically live on your development machine, not your phone. When you send a request like fixing a TypeScript error, running tests or checking a diff, Codex processes it inside the selected project on the host and sends results back for you to review. If you want to understand the foundation before using remote access, check out What is Codex and how to use Codex to get a clear picture of where this tool fits in your workflow. What do you need before connecting ChatGPT app to Codex? According to the latest Codex documentation from OpenAI, ChatGPT app supports controlling Codex on both macOS and Windows, though Linux is not supported yet. Notably, this feature works with all ChatGPT account types, including Free and Go — no paid plan required. You only need to make sure you're signed into the same account or workspace on both devices: ChatGPT mobile (latest version on iOS or Android) and Codex (latest version on your host machine, online and running). Your host machine must stay on and Codex must keep running for the entire time you're controlling it remotely. If the machine goes to sleep, loses its connection or Codex is closed, the connection from your phone drops immediately and any tasks in progress may be interrupted. What's worth noting is that the entire setup process starts from Codex App on the host machine and is surprisingly simple — just scan a QR code and you're done. Inside Codex App, select the mobile setup option in the sidebar, scan the QR code with your phone, then complete the confirmation in ChatGPT app. For enterprise workspaces, an admin may need to enable Remote Control permissions before you can connect. This QR code grants control over your computer, so keep it private and never share it with anyone to avoid unauthorized access to your machine. To summarize, connecting ChatGPT app to Codex is straightforward: Host machine must be online and running Codex ChatGPT app and Codex must be signed into the same account or workspace Generate the QR code in Codex on the host and complete setup on your phone MFA, SSO or passkey requirements may still apply depending on your workspace What can you do once connected? Once the host appears in Codex on your phone, you can start a new thread inside a project on the host or pick up an existing one. This is where the experience becomes genuinely useful: you can send follow-ups, answer Codex's questions, approve commands, view output, check diffs, review test results and even receive notifications when a task finishes or needs your attention. A real example: you're at a coffee shop and remember the login form has a validation bug. You open ChatGPT app, select the connected host, and ask Codex to check the auth flow, fix the email validation error and run the related tests. Codex works directly on the repo sitting on your host machine, while you review the results, approve actions when needed and decide whether to request further changes. This is also why people are starting to think of Codex and other AI-powered IDEs as a colleague working inside a real environment, not just a code suggestion tool anymore. Its strength lies in reading files, running commands, editing code and maintaining context across multiple rounds of back-and-forth. Limitations to keep in mind when using Codex from your phone Remote control depends entirely on the host machine — if your computer goes to sleep, loses its connection, closes Codex or gets signed out of the workspace, your phone loses its working environment immediately. That said, if Codex is mid-task when the connection drops, it will continue running on the host and notify you once your phone reconnects, so there's less to worry about if your phone suddenly loses signal during a running task. One more thing to note: on Windows, tasks using Computer Use require an appropriate foreground session, so this setup is not a complete replacement for sitting directly in front of your machine. It also helps to draw a clear line between handing off a focused task and reviewing large changes. Your phone works well for small bugs, running tests, quick questions about a specific file, reviewing short tasks or checking task status. However, anything requiring a high level of attention should still be reviewed on a larger screen to avoid missing details. How to use it effectively in practice The most effective approach is to hand off tasks with a clear scope and specific expected outcomes. Instead of saying "fix the login", describe exactly where the error occurs, what the expected behavior should be after the fix, which tests to run and which parts of the codebase to leave untouched. Codex performs better when it knows the boundaries of a task, especially since remote mobile means each feedback loop takes longer than when you're sitting right at your machine. A clean working rhythm might look like this: describe the task in detail whether small or medium-sized, ask Codex to read the relevant files, let it propose a solution, only approve when necessary and wait for the result report. Once you get used to this rhythm, you'll find that idle time outside can handle real work — while keeping the final decision firmly in your hands. Compared to Claude Code Remote and Telegram bot There are many ways to control an AI coding agent from your phone, though the three most common approaches each serve a different need. Criteria ChatGPT app + Codex Claude Code Remote Telegram + Codex Natural conversation ✅ Excellent ✅ Good ❌ Requires exact syntax Granular control Moderate Highest Low Connection stability Stable Stable Frequent drops Mobile UI Well optimized Not fully optimized Uses existing Telegram app Initial setup Easy, scan QR Easy Requires manual bot configuration Computer must stay on ✅ Required ✅ Required ✅ Required Claude Code Remote Control offers the strongest level of control — you get direct terminal output, can intervene mid-task and generally feel much closer to what the agent is doing. That said, the UI on small phone screens isn't fully optimized yet, and some interactions are still difficult to perform without a physical keyboard. Telegram bot has the advantage of not requiring a separate app and is easy to get started with, but the real-world experience has clear limits: it's prone to slowdowns, occasional silent disconnections mid-task, and because it lacks genuine AI context, anything slightly more complex than a simple command quickly falls apart — forcing you to type precise instructions rather than describe what you need naturally. ChatGPT app + Codex sits at the best balance point for most users — smooth enough, smart enough, quick to set up with a QR scan and no new syntax to learn before you can get to work. Connecting ChatGPT app to Codex doesn't turn your phone into a development machine — it turns your phone into a control surface for a development machine that's already ready to work. As long as the host stays on, permissions are configured correctly and the task is scoped tightly enough, this is the most practical way to handle real coding work when you're away from your laptop.

Nam•

22 Jun, 2026

What Is Hermes Agent? Nous Research's Self-Learning AI

Learning more makes you better, a principle long assumed to apply only to humans, turns out to hold true for Hermes Agent too, an open-source AI agent from Nous Research. Every time you work with it, Hermes Agent doesn't forget, it remembers, understands you more deeply, and gets better with each session, thanks to a memory system that can recall everything about you even after the machine has been off for a week. What Is Hermes Agent? Hermes Agent is an open-source AI agent developed and released under the MIT license by Nous Research, the lab behind the Hermes, Nomos, and Psyche model lines. Unlike Antigravity or Codex, which depend on an IDE environment, or ordinary chatbots that ultimately remain a thin wrapper calling a single API, Hermes Agent is built to run continuously on a user's own infrastructure, from a cheap VPS to a GPU cluster or serverless infrastructure, and it operates in a way fairly similar to OpenClaw. The core difference in Hermes Agent lies in how it manages long-term memory and converts experience into real skills. Instead of merely storing raw information or passively remembering preferences the way AI like Gemini or Claude do, Hermes runs a closed "learning loop," meaning that after every work session, it actively distills the process into new tools it can use the next time. This system is run by a background "Curator" agent that automatically scores, prunes, and merges accumulated knowledge, combined with FTS5 search technology that retrieves old memories roughly 4,500 times faster without spending any tokens. As a result, Hermes doesn't just respond and forget, it genuinely becomes a collaborator that grows more knowledgeable and capable over time. Four Features That Set Hermes Agent Apart Nous Research doesn't call Hermes Agent a chatbot or a copilot, it positions it as an agent with a built-in learning loop. The four feature groups below explain why that label isn't just marketing. Memory That Persists Across Sessions The biggest weakness of most AI today is that memory only stores raw chat text rather than how work actually gets done. Hermes Agent addresses this through three combined mechanisms: Fast retrieval: Uses FTS5 full-text search to pull up old memories roughly 4,500 times faster than conventional search, without spending extra tokens the way Gemini or Cowork do. User understanding: Integrates Honcho's dialectical user-modeling approach, helping the agent understand preferences, habits, and personal context in depth across thousands of sessions. Continuity: The agent picks up work exactly where you left off, even if that was a project from weeks earlier. Self-Generating and Self-Improving Skills This is the feature that makes Hermes Agent behave like a collaborator that accumulates experience, rather than just a tool that answers on request: Learning from real use: After completing complex tasks, Hermes Agent distills the process into new skills and stores them in a library to be reused automatically next time. Open agentskills.io standard: These skills follow an open standard, so they can be packaged, shared, and reused across different AI systems without being rewritten from scratch. The Curator mechanism: A background administrative agent periodically scores, prunes, and merges duplicate skills, which keeps the skill library from bloating and becoming disorganized over time. Present on More Than 23 Messaging Platforms Hermes Agent isn't confined to a computer, it integrates directly into the messaging channels people already use on their phones every day: Multiple channels, one brain: You can command Hermes Agent through Telegram, Discord, Slack, WhatsApp, Signal, email, or SMS. Context retained: Whether you message via Telegram in the morning or switch to Discord at night, the agent keeps a single thread of memory, never fragmented by channel. Multimodal interaction: Supports sending voice messages, images, and video, along with the ability to analyze multimodal content. Flexible Runtime Infrastructure Hermes Agent supports six backend types for executing commands: local machine, Docker, SSH, Daytona, Singularity, and Modal. With Daytona and Modal, the environment can hibernate when idle and cost almost nothing while waiting, waking up only when there's work to process. This is why Nous Research describes Hermes Agent as an always-on agent that doesn't require users to keep a server running 24/7 at high cost year-round. Hermes Agent can be installed with a single curl command, supporting Linux, macOS, and Windows via WSL2, or, as of June 5, 2026 with version v0.16.0 "The Surface Release," through an official Native Desktop app for Windows, macOS, and Linux with a fully polished GUI, making it accessible to everyday users without needing a terminal. Built-In Toolset and Limitations to Know 40-Plus Built-In Tools, From Web Search to Schedule Automation Hermes Agent ships with more than 40 built-in tools, including web search, browser actions, file handling, and Python script execution via RPC to run sub-tasks without consuming the main agent's context window. A natural-language scheduling system lets you set recurring tasks like daily reports or data backups, then leaves the agent to run them without being reminded. For tasks that need full isolation, Hermes Agent also supports sub-agents with their own conversation, terminal, and scripts, allowing multiple jobs to run in parallel without diluting the main memory. Challenges and Security Considerations Despite rapid updates, Hermes Agent still has a few points users should keep in mind before deploying it: Stability of the self-learning mechanism: The ability to self-improve skills boosts success rates, with a Tencent Cloud report recording gains of up to 52% along with token savings of up to 61%. However, since this is a self-evolving mechanism, real-world effectiveness still depends on the underlying model chosen and still requires human oversight rather than full trust. Risk from high-level permissions, with security responsibility falling on the user: Hermes Agent can intervene deeply in a system (excessive agency), so connecting it directly to multiple messaging platforms requires users to manage their own API keys and set up guardrails. Unlike closed AI services, Hermes Agent hands full control over to the user, which means the user also bears greater responsibility for configuring access permissions to avoid information leaks. Why Is Hermes Agent Growing So Fast? Hermes Agent's growth could be attributed to Nous Research's marketing, but in our view it comes down to three main factors. A Frictionless Migration Path From OpenClaw Recognizing OpenClaw's large user base, Nous Research built a migration tool that lets users carry over their persona, API keys, the entire skill set, and memory to Hermes Agent with a single command, without losing old data and, of course, without having to reconfigure anything from scratch. If you're currently using OpenClaw and want to try Hermes Agent without losing your old data, look for the hermes claw migrate migration tool built into Hermes Agent before considering a fresh install. Betting on a Closed Learning Loop Instead of a Feature Race While many other agents compete on the number of tools they offer, Hermes Agent positions itself as a self-evolving entity, one that distills experience into new skills and retains long-term memory to understand users more deeply over time. This approach creates lasting value, and the community has already put it to use for projects such as automating large-scale content production with high consistency across many sessions. A Role as a Training Data Engine Beyond serving as a personal assistant, Hermes Agent also functions as a capable research tool. It can generate thousands of parallel tool-calling trajectories and compress them into training data for other AI models. By turning the agent's real-world experience into training data, Hermes becomes a platform that developers building the next generation of autonomous AI can't easily do without. How Is Hermes Agent Different From an Agent Harness? People new to the space often confuse Hermes Agent with the concept of an agent harness, which is the framework that decides how a model calls tools, handles the reasoning loop, and coordinates execution steps internally. If a harness is the engine and chassis that determine how a car drives, then Hermes Agent is like a car that already has that engine installed, plus seats, a navigation system, and the driver's own trip memory. In other words, a harness is the technical architecture layer underneath, while Hermes Agent is a complete end-user product that already packages memory, a skill system, communication channels, and a choice of runtime infrastructure. A developer can build their own harness to control every small detail, but most users don't need to go that deep, they just need an agent that runs right away and gets smarter through use. For a closer look at this underlying architecture layer, read more at What Is Agent Harness? The Framework That Makes AI Work Efficiently, which explains in detail how this type of framework operates. Is Hermes Agent Worth Trying Right Now? Being fully open source, collecting no user data, and supporting complete self-hosting, Hermes Agent is one of the few agents today that lets users keep full control over their own data while still getting a continuous assistant experience with real memory, not the simulated memory that only exists within a single chat. After v0.16.0, the biggest technical barrier for users unfamiliar with terminals has largely been removed, as the native desktop app for Windows, macOS, and Linux has fully replaced the pure CLI approach used before. What's left to judge about Hermes Agent isn't whether it runs, but what it learns after a few real weeks of use. The fastest way to find out is to install the desktop app or run the CLI on a cheap VPS, connect it to a familiar messaging channel like Telegram, then watch what skills the agent forms on its own from how you use it every day. That's also the groundwork for comparing Hermes Agent with other options on the market, from Agent Harness to OpenClaw and Claude Cowork, in the next part of this series.

Nam•

19 Jun, 2026

Quick Summary

Kỷ nguyên suy luận là gì và tại sao nó lại quan trọng với các mô hình AI

Ưu thế của Google đi kèm với thách thức

Tối ưu hóa hiệu suất và chi phí

Vai trò của TPU Ironwood và CPU Axion là gì

Thách thức hạ tầng và năng lượng

Cam kết môi trường và khủng hoảng năng lượng

Bong bóng và rủi ro tài chính của AI sẽ diễn biến thế nào

Google có đang thách thức sự thống trị của Nvidia

Discussion (0)

Related Articles

Gemini powers Argentina and Messi at World Cup 2026

YC CEO's 6 forcing questions before starting any project

How to control Codex from your phone with ChatGPT app

What Is Hermes Agent? Nous Research's Self-Learning AI