4AIVN - Discover AI Models and Tools Rankings

YC CEO's 6 forcing questions before starting any project

I'd heard a lot about the gstack repo from the CEO of Y Combinator, so I got curious and installed it to try. What surprised me most wasn't the polished workflows — it was the genuinely different mindset behind them. That mindset shows up in the very first command: /office-hours, with six questions that don't ask about code at all, only the things most people haven't thought through before they start building. What is gstack and why did Garry Tan build it gstack is an open-source toolkit by Garry Tan, CEO of Y Combinator, built primarily for Claude Code. The core idea: instead of using AI as a plain code writer, Garry Tan wanted to turn Claude into a small AI agent team, where each member handles a different role — from product direction and security review to testing and release. The entire workflow runs in an ordered loop: Think → Plan → Build → Review → Test → Ship → Reflect. More specifically, gstack splits Claude Code into 23 specialized roles, and the output of each step is automatically passed to the next — no manual handoff needed. Some of the standout commands: /office-hours 6 questions that force you to rethink your feature before writing a single line of code /plan-ceo-review checks whether you're overbuilding or underbuilding relative to what's actually needed /review catches serious bugs that standard automated checks miss /qa opens a real browser, performs real interactions, finds real bugs /cso runs an automated security audit against international standards /ship syncs, tests, pushes code and opens a pull request in a single command How effective is gstack? Garry Tan says his working speed in 2026 is roughly 810 times faster than in 2013, measured by lines of completed code per day (11,417 vs 14). In 60 days, he shipped 3 production services and over 40 features — all while running Y Combinator full-time. Andrej Karpathy, co-founder of OpenAI, confirmed a similar trend, sharing that he hasn't typed a single line of code himself since December 2025 thanks to AI agents. But among all those commands, /office-hours stands out for the opposite reason from the rest, it doesn't help you work faster and it helps you avoid building the wrong thing from the start. Why Garry Tan puts /office-hours first Garry Tan placed /office-hours at the top of the workflow based on a simple observation: most products fail not because of poor code, but because they build the wrong thing. Teams spend weeks on a feature nobody needs, or build the right feature for the wrong audience, or solve a problem users already handle better another way. The command has two modes: Startup mode for founders and people building real products with real users, and Builder mode for side projects, hackathons, and open source. This article focuses on Startup mode, where the 6 questions are most directly applicable. 6 questions that stop you from building the wrong thing These aren't 6 questions to answer quickly and move on. They're designed to make you think honestly, because the more truthful your answers, the more accurately Claude can match what you actually need — saving you a significant amount of time later. You can read the full original prompts at office-hours/SKILL.md.tmpl. Demand reality: Is there a real need? Original question: "Who specifically has this problem? How are they solving it today?" Not "users in general" or "the marketing team" — the goal is to name one real person, ideally by name, who is actively struggling with a specific problem. If you can't name someone like that, you don't yet understand what they actually need. Concrete example: Instead of "users want better task management," it should be: "Minh, a project manager at a 20-person company, copy-pastes between Notion and Google Sheets every Monday morning because the two tools don't sync." Apply this to your own situation accordingly. Status quo: What are they using instead? Original question: "What is their current workaround? How much better do you need to be for them to switch?" Everyone is already solving their problem somehow — whether with Excel, sticky notes, or a WhatsApp group. If their current solution is good enough, they have no reason to migrate their data and learn an entirely new platform. Your solution needs to be meaningfully better before they'll even consider switching. Desperate specificity: Who needs this badly enough? Original question: "Who needs a solution badly enough to use your ugly beta version today?" This is the question that separates nice-to-have from must-have. If you can't find anyone willing to use an incomplete, rough, buggy version right now, the problem you're solving isn't urgent enough. Real early users are people who need a solution badly enough to tolerate an unpolished product — as long as it's moving in the right direction. Narrowest wedge: What is the smallest possible piece? Original question: "What is the smallest thing you could launch tomorrow? Not the full vision — the smallest piece." Not the first full-featured version — something even smaller than that. This question typically cuts 80% of the scope people add because they think "might as well do it while I'm here." It's a trap many builders fall into, including myself. Launch the smallest meaningful piece first, listen to real users, then decide whether to expand. Common mistake: Many people confuse "smallest piece" with "first full-featured version." The narrowest wedge truly means one small thing that solves one specific problem for one specific group of users — nothing more. Observation and surprise: Have you watched real people use it? Original question: "Have you watched real people use your product? Did they use it in ways you didn't expect?" This question is best saved for the second iteration onward, once you have something to test. Rather than asking for feedback through messages or surveys, sit and watch directly — or review screen recordings. The most valuable insights usually don't come from what users say, but from what they do that you didn't design for, or what they skip that you thought was important. Note: If you're in your first iteration and don't have a product yet, you can skip this question and come back after launching the smallest piece in step 4. Future-fit: The 2 to 3 year view Original question: "In 2-3 years, will what you're building still be relevant — or is the trend moving against you?" This isn't about predicting the future precisely. It's about avoiding building something that's already fading. If the trend is making your problem less urgent over the next two years, that's a clear signal to reconsider from the start. That said, if your goal is to move fast and capture the market before big tech ships something similar, this question can reasonably be set aside. A real example: a simple idea completely flipped In the gstack documentation, Garry Tan walks through a practical example. You open /office-hours and say: "I want to build an app that summarizes my daily work calendar." Claude doesn't agree and start executing. Instead, it pushes back: what you just described isn't a calendar summary app — it's actually a full personal AI chief of staff. These are entirely different in scope, technical complexity, and user expectations. From that single opening description, /office-hours helps you see: 5 features you were describing without realizing it 4 assumptions that need to be validated before building 3 different implementation directions with varying levels of complexity 1 recommendation: launch the smallest piece first, treat the rest as a long-term roadmap All of this happens before you write a single line of code. The output is saved as a document that subsequent steps in the workflow automatically pick up and continue from. These 6 questions work even without gstack The 6 questions from /office-hours don't require Claude Code or a gstack installation. They're a way of thinking — the same framework YC partners use to evaluate startups — and you can apply them right now with any AI tool you already have. The difference when using them through gstack is that Claude won't let you give vague answers. It pushes for specifics and won't move forward until your response is grounded enough to be useful. That's why /office-hours tends to be the most uncomfortable command in the entire toolkit — not because it's difficult to use, but because it asks exactly what you've been avoiding. Try it today: Before starting your next project, paste these 6 questions into Claude, Gemini, or ChatGPT along with your idea. Ask it to go through each question one at a time and not let you skip any. The results are often more surprising than you'd expect — even for ideas you've already thought through carefully. gstack currently has over 117k stars on GitHub and is still growing. For me, the most valuable part isn't the technical commands like /review or /ship — it's /office-hours, because it's the only command in the entire toolkit that forces you to stop and think before doing anything else.

Nam•

27 Jun, 2026

How to control Codex from your phone with ChatGPT app

You're out and suddenly remember a small detail in your project that needs fixing — you don't have to open your laptop or remote desktop in. With the right connection set up, ChatGPT app on your phone can become a control panel for Codex, while your computer at home or the office keeps running the actual code. ChatGPT app doesn't run Codex on your phone The easiest thing to misunderstand is thinking Codex is running directly on your phone. In reality, your phone only sends prompts, replies, approvals and follow-up messages, while the actual working environment lives on your Mac or Windows machine running Codex. In other words, ChatGPT app is the remote controller, and the host machine is where your repo, terminal, credentials, plugins, MCP servers and other tools actually live. This makes complete sense because codebases typically live on your development machine, not your phone. When you send a request like fixing a TypeScript error, running tests or checking a diff, Codex processes it inside the selected project on the host and sends results back for you to review. If you want to understand the foundation before using remote access, check out What is Codex and how to use Codex to get a clear picture of where this tool fits in your workflow. What do you need before connecting ChatGPT app to Codex? According to the latest Codex documentation from OpenAI, ChatGPT app supports controlling Codex on both macOS and Windows, though Linux is not supported yet. Notably, this feature works with all ChatGPT account types, including Free and Go — no paid plan required. You only need to make sure you're signed into the same account or workspace on both devices: ChatGPT mobile (latest version on iOS or Android) and Codex (latest version on your host machine, online and running). Your host machine must stay on and Codex must keep running for the entire time you're controlling it remotely. If the machine goes to sleep, loses its connection or Codex is closed, the connection from your phone drops immediately and any tasks in progress may be interrupted. What's worth noting is that the entire setup process starts from Codex App on the host machine and is surprisingly simple — just scan a QR code and you're done. Inside Codex App, select the mobile setup option in the sidebar, scan the QR code with your phone, then complete the confirmation in ChatGPT app. For enterprise workspaces, an admin may need to enable Remote Control permissions before you can connect. This QR code grants control over your computer, so keep it private and never share it with anyone to avoid unauthorized access to your machine. To summarize, connecting ChatGPT app to Codex is straightforward: Host machine must be online and running Codex ChatGPT app and Codex must be signed into the same account or workspace Generate the QR code in Codex on the host and complete setup on your phone MFA, SSO or passkey requirements may still apply depending on your workspace What can you do once connected? Once the host appears in Codex on your phone, you can start a new thread inside a project on the host or pick up an existing one. This is where the experience becomes genuinely useful: you can send follow-ups, answer Codex's questions, approve commands, view output, check diffs, review test results and even receive notifications when a task finishes or needs your attention. A real example: you're at a coffee shop and remember the login form has a validation bug. You open ChatGPT app, select the connected host, and ask Codex to check the auth flow, fix the email validation error and run the related tests. Codex works directly on the repo sitting on your host machine, while you review the results, approve actions when needed and decide whether to request further changes. This is also why people are starting to think of Codex and other AI-powered IDEs as a colleague working inside a real environment, not just a code suggestion tool anymore. Its strength lies in reading files, running commands, editing code and maintaining context across multiple rounds of back-and-forth. Limitations to keep in mind when using Codex from your phone Remote control depends entirely on the host machine — if your computer goes to sleep, loses its connection, closes Codex or gets signed out of the workspace, your phone loses its working environment immediately. That said, if Codex is mid-task when the connection drops, it will continue running on the host and notify you once your phone reconnects, so there's less to worry about if your phone suddenly loses signal during a running task. One more thing to note: on Windows, tasks using Computer Use require an appropriate foreground session, so this setup is not a complete replacement for sitting directly in front of your machine. It also helps to draw a clear line between handing off a focused task and reviewing large changes. Your phone works well for small bugs, running tests, quick questions about a specific file, reviewing short tasks or checking task status. However, anything requiring a high level of attention should still be reviewed on a larger screen to avoid missing details. How to use it effectively in practice The most effective approach is to hand off tasks with a clear scope and specific expected outcomes. Instead of saying "fix the login", describe exactly where the error occurs, what the expected behavior should be after the fix, which tests to run and which parts of the codebase to leave untouched. Codex performs better when it knows the boundaries of a task, especially since remote mobile means each feedback loop takes longer than when you're sitting right at your machine. A clean working rhythm might look like this: describe the task in detail whether small or medium-sized, ask Codex to read the relevant files, let it propose a solution, only approve when necessary and wait for the result report. Once you get used to this rhythm, you'll find that idle time outside can handle real work — while keeping the final decision firmly in your hands. Compared to Claude Code Remote and Telegram bot There are many ways to control an AI coding agent from your phone, though the three most common approaches each serve a different need. Criteria ChatGPT app + Codex Claude Code Remote Telegram + Codex Natural conversation ✅ Excellent ✅ Good ❌ Requires exact syntax Granular control Moderate Highest Low Connection stability Stable Stable Frequent drops Mobile UI Well optimized Not fully optimized Uses existing Telegram app Initial setup Easy, scan QR Easy Requires manual bot configuration Computer must stay on ✅ Required ✅ Required ✅ Required Claude Code Remote Control offers the strongest level of control — you get direct terminal output, can intervene mid-task and generally feel much closer to what the agent is doing. That said, the UI on small phone screens isn't fully optimized yet, and some interactions are still difficult to perform without a physical keyboard. Telegram bot has the advantage of not requiring a separate app and is easy to get started with, but the real-world experience has clear limits: it's prone to slowdowns, occasional silent disconnections mid-task, and because it lacks genuine AI context, anything slightly more complex than a simple command quickly falls apart — forcing you to type precise instructions rather than describe what you need naturally. ChatGPT app + Codex sits at the best balance point for most users — smooth enough, smart enough, quick to set up with a QR scan and no new syntax to learn before you can get to work. Connecting ChatGPT app to Codex doesn't turn your phone into a development machine — it turns your phone into a control surface for a development machine that's already ready to work. As long as the host stays on, permissions are configured correctly and the task is scoped tightly enough, this is the most practical way to handle real coding work when you're away from your laptop.

Nam•

22 Jun, 2026

What Is Hermes Agent? Nous Research's Self-Learning AI

Learning more makes you better, a principle long assumed to apply only to humans, turns out to hold true for Hermes Agent too, an open-source AI agent from Nous Research. Every time you work with it, Hermes Agent doesn't forget, it remembers, understands you more deeply, and gets better with each session, thanks to a memory system that can recall everything about you even after the machine has been off for a week. What Is Hermes Agent? Hermes Agent is an open-source AI agent developed and released under the MIT license by Nous Research, the lab behind the Hermes, Nomos, and Psyche model lines. Unlike Antigravity or Codex, which depend on an IDE environment, or ordinary chatbots that ultimately remain a thin wrapper calling a single API, Hermes Agent is built to run continuously on a user's own infrastructure, from a cheap VPS to a GPU cluster or serverless infrastructure, and it operates in a way fairly similar to OpenClaw. The core difference in Hermes Agent lies in how it manages long-term memory and converts experience into real skills. Instead of merely storing raw information or passively remembering preferences the way AI like Gemini or Claude do, Hermes runs a closed "learning loop," meaning that after every work session, it actively distills the process into new tools it can use the next time. This system is run by a background "Curator" agent that automatically scores, prunes, and merges accumulated knowledge, combined with FTS5 search technology that retrieves old memories roughly 4,500 times faster without spending any tokens. As a result, Hermes doesn't just respond and forget, it genuinely becomes a collaborator that grows more knowledgeable and capable over time. Four Features That Set Hermes Agent Apart Nous Research doesn't call Hermes Agent a chatbot or a copilot, it positions it as an agent with a built-in learning loop. The four feature groups below explain why that label isn't just marketing. Memory That Persists Across Sessions The biggest weakness of most AI today is that memory only stores raw chat text rather than how work actually gets done. Hermes Agent addresses this through three combined mechanisms: Fast retrieval: Uses FTS5 full-text search to pull up old memories roughly 4,500 times faster than conventional search, without spending extra tokens the way Gemini or Cowork do. User understanding: Integrates Honcho's dialectical user-modeling approach, helping the agent understand preferences, habits, and personal context in depth across thousands of sessions. Continuity: The agent picks up work exactly where you left off, even if that was a project from weeks earlier. Self-Generating and Self-Improving Skills This is the feature that makes Hermes Agent behave like a collaborator that accumulates experience, rather than just a tool that answers on request: Learning from real use: After completing complex tasks, Hermes Agent distills the process into new skills and stores them in a library to be reused automatically next time. Open agentskills.io standard: These skills follow an open standard, so they can be packaged, shared, and reused across different AI systems without being rewritten from scratch. The Curator mechanism: A background administrative agent periodically scores, prunes, and merges duplicate skills, which keeps the skill library from bloating and becoming disorganized over time. Present on More Than 23 Messaging Platforms Hermes Agent isn't confined to a computer, it integrates directly into the messaging channels people already use on their phones every day: Multiple channels, one brain: You can command Hermes Agent through Telegram, Discord, Slack, WhatsApp, Signal, email, or SMS. Context retained: Whether you message via Telegram in the morning or switch to Discord at night, the agent keeps a single thread of memory, never fragmented by channel. Multimodal interaction: Supports sending voice messages, images, and video, along with the ability to analyze multimodal content. Flexible Runtime Infrastructure Hermes Agent supports six backend types for executing commands: local machine, Docker, SSH, Daytona, Singularity, and Modal. With Daytona and Modal, the environment can hibernate when idle and cost almost nothing while waiting, waking up only when there's work to process. This is why Nous Research describes Hermes Agent as an always-on agent that doesn't require users to keep a server running 24/7 at high cost year-round. Hermes Agent can be installed with a single curl command, supporting Linux, macOS, and Windows via WSL2, or, as of June 5, 2026 with version v0.16.0 "The Surface Release," through an official Native Desktop app for Windows, macOS, and Linux with a fully polished GUI, making it accessible to everyday users without needing a terminal. Built-In Toolset and Limitations to Know 40-Plus Built-In Tools, From Web Search to Schedule Automation Hermes Agent ships with more than 40 built-in tools, including web search, browser actions, file handling, and Python script execution via RPC to run sub-tasks without consuming the main agent's context window. A natural-language scheduling system lets you set recurring tasks like daily reports or data backups, then leaves the agent to run them without being reminded. For tasks that need full isolation, Hermes Agent also supports sub-agents with their own conversation, terminal, and scripts, allowing multiple jobs to run in parallel without diluting the main memory. Challenges and Security Considerations Despite rapid updates, Hermes Agent still has a few points users should keep in mind before deploying it: Stability of the self-learning mechanism: The ability to self-improve skills boosts success rates, with a Tencent Cloud report recording gains of up to 52% along with token savings of up to 61%. However, since this is a self-evolving mechanism, real-world effectiveness still depends on the underlying model chosen and still requires human oversight rather than full trust. Risk from high-level permissions, with security responsibility falling on the user: Hermes Agent can intervene deeply in a system (excessive agency), so connecting it directly to multiple messaging platforms requires users to manage their own API keys and set up guardrails. Unlike closed AI services, Hermes Agent hands full control over to the user, which means the user also bears greater responsibility for configuring access permissions to avoid information leaks. Why Is Hermes Agent Growing So Fast? Hermes Agent's growth could be attributed to Nous Research's marketing, but in our view it comes down to three main factors. A Frictionless Migration Path From OpenClaw Recognizing OpenClaw's large user base, Nous Research built a migration tool that lets users carry over their persona, API keys, the entire skill set, and memory to Hermes Agent with a single command, without losing old data and, of course, without having to reconfigure anything from scratch. If you're currently using OpenClaw and want to try Hermes Agent without losing your old data, look for the hermes claw migrate migration tool built into Hermes Agent before considering a fresh install. Betting on a Closed Learning Loop Instead of a Feature Race While many other agents compete on the number of tools they offer, Hermes Agent positions itself as a self-evolving entity, one that distills experience into new skills and retains long-term memory to understand users more deeply over time. This approach creates lasting value, and the community has already put it to use for projects such as automating large-scale content production with high consistency across many sessions. A Role as a Training Data Engine Beyond serving as a personal assistant, Hermes Agent also functions as a capable research tool. It can generate thousands of parallel tool-calling trajectories and compress them into training data for other AI models. By turning the agent's real-world experience into training data, Hermes becomes a platform that developers building the next generation of autonomous AI can't easily do without. How Is Hermes Agent Different From an Agent Harness? People new to the space often confuse Hermes Agent with the concept of an agent harness, which is the framework that decides how a model calls tools, handles the reasoning loop, and coordinates execution steps internally. If a harness is the engine and chassis that determine how a car drives, then Hermes Agent is like a car that already has that engine installed, plus seats, a navigation system, and the driver's own trip memory. In other words, a harness is the technical architecture layer underneath, while Hermes Agent is a complete end-user product that already packages memory, a skill system, communication channels, and a choice of runtime infrastructure. A developer can build their own harness to control every small detail, but most users don't need to go that deep, they just need an agent that runs right away and gets smarter through use. For a closer look at this underlying architecture layer, read more at What Is Agent Harness? The Framework That Makes AI Work Efficiently, which explains in detail how this type of framework operates. Is Hermes Agent Worth Trying Right Now? Being fully open source, collecting no user data, and supporting complete self-hosting, Hermes Agent is one of the few agents today that lets users keep full control over their own data while still getting a continuous assistant experience with real memory, not the simulated memory that only exists within a single chat. After v0.16.0, the biggest technical barrier for users unfamiliar with terminals has largely been removed, as the native desktop app for Windows, macOS, and Linux has fully replaced the pure CLI approach used before. What's left to judge about Hermes Agent isn't whether it runs, but what it learns after a few real weeks of use. The fastest way to find out is to install the desktop app or run the CLI on a cheap VPS, connect it to a familiar messaging channel like Telegram, then watch what skills the agent forms on its own from how you use it every day. That's also the groundwork for comparing Hermes Agent with other options on the market, from Agent Harness to OpenClaw and Claude Cowork, in the next part of this series.

Nam•

19 Jun, 2026

Gemini powers Argentina and Messi at World Cup 2026

Gemini has won big in the most literal sense, right as Messi scored his first hat-trick at the 2026 World Cup, leading Argentina to a crushing 3-0 victory over Algeria and equaling Miroslav Klose's record of 16 World Cup goals. That historic moment became the perfect launchpad for Gemini. Back in March 2026, Google and the Argentine Football Association (AFA) made a bold decision: rather than simply printing a logo on training kits, they signed a deal for the AI to actively support tactical preparation and professional decision-making. That bet has now proven to be the right call. From training kit to the tactical meeting room The agreement between AFA and Google was unveiled at Times Square, New York, a venue deliberately chosen to capture global media attention. The Gemini logo appears across all training apparel for Argentina's men's, women's and youth squads, sitting alongside Adidas and American Express in AFA's top sponsorship tier. But the interesting part isn't the jersey. According to Inside World Football, Argentina's coaching staff will use Gemini for three specific purposes: tactical analysis, injury prevention and decision support. In other words, Gemini now has a seat in meetings that previously belonged only to Scaloni and his assistants. Google has not publicly disclosed which specific Gemini tools have been integrated into AFA's workflow. What is clear is that they are using the World Cup to bring Gemini into the reality of professional football, and the results will be graded in public. What is Gemini actually doing in the dressing room? Argentina arrives at the 2026 World Cup as the reigning champion. Every decision Scaloni makes, from the squad list to the starting eleven, is scrutinized more closely than any other team, and that is precisely why Argentina has become the most ideal testing ground Google has ever had for Gemini in professional football, especially at a major tournament. Tactical analysis Gemini is used to process match data for both Argentina and their opponents, covering movement statistics, attacking patterns and defensive vulnerabilities. Instead of the coaching staff spending hours reviewing footage, AI synthesizes the data and generates tactical diagrams automatically, saving significant preparation time before each match. Injury prevention This is a problem every major team wants to solve, especially when Messi and several key players are at an age that requires careful management of training loads. Gemini analyzes biometric data and injury history to issue early warnings, helping the coaching staff adjust intensity before problems actually occur. That is part of the reason why, immediately after completing his hat-trick, Scaloni chose to substitute Messi off, prioritizing fitness and safety for the matches ahead. AI in injury prevention is nothing new. Premier League clubs have had Microsoft as a partner for similar purposes. What is different this time is that Gemini is integrated directly into the workflow of a national team competing at a major tournament, not just at club level. For fans: create Messi content, follow scores without unlocking your screen Alongside supporting the coaching staff, Gemini has also rolled out a range of features aimed at fans, and this is the side that hundreds of millions of people will actually experience. Gemini lets you create content about players directly Users can generate images, songs and digital content featuring Argentina players like Messi directly inside the Gemini app. The feature is designed to bring the World Cup experience closer to those who cannot attend matches in person. Real-time scores and automated daily briefings On Google Search, live match scores can be pinned to the lock screen and update in real time, with dedicated animations for goals and red cards, all without needing to unlock the phone. For paid Gemini users, the Scheduled Actions feature allows an automated daily football briefing to be set up, covering scores, news and fixtures, delivered at a chosen time without needing to prompt it each day. Match-day infrastructure Google has updated Street View at all 16 host stadiums and optimized routing on Waze for match days. Waze also surfaces live scores when the car is stopped at red lights, so drivers do not need to pick up their phones while on the move. The 2026 World Cup is the real test for AI in sport Google is not sponsoring Argentina alone. Gemini also appears on the kits of France, Morocco, Iraq, Turkey and the United States, while Pixel is the official phone of the French squad, which is also using Gemini for internal communications. This is clearly a comprehensive strategy from Google, not a one-off deal. What makes the 2026 World Cup particularly significant is that it will answer a question no lab environment can: what do users actually do with AI when a World Cup runs for six weeks across 104 matches? Features that run on initial novelty will fade after the group stage. Whatever users keep coming back to all the way through the final is the honest answer to where AI actually fits in everyday life, and Google knows it. Google's communications director for Latin America, Flor Sabatini, stated that the 2026 World Cup will mark a before and after in the history of football because of AI. It sounds like marketing, but the reality is that this is the first time a major AI model has been integrated into the preparation of the reigning world champions, right in the middle of the most-watched sporting event on the planet. The 2026 World Cup is Gemini's real test The most significant part of this entire story is not the Gemini logo on Messi's jersey. It is the fact that Argentina, still the most expected to win and the most scrutinized team, carrying the pressure of defending the title, has committed part of its preparation process to AI. If Argentina succeeds, Gemini will have a case study that no advertising budget can buy. If Argentina falls short and the coaching staff attributes any part of it to AI, the narrative will flip entirely. Either way, this is the first time AI has been held accountable on a stage that genuinely matters, not a benchmark, not a demo, but the World Cup. For AI users, what is worth watching is not just whether Argentina wins, but whether Gemini actually changes how a football team operates, or whether it turns out to be nothing more than a logo on a training kit that looks better than previous years.

Nam•

17 Jun, 2026

AI Technology at World Cup 2026: A Complete Overview

The Adidas Trionda match ball, three dimensional player models accurate to the millimeter, robot dogs patrolling stadiums, and Google Gemini sitting on the touchline with the Argentina national team. World Cup 2026 is not only the largest tournament in history with 104 matches across 16 cities in the United States, Canada, and Mexico, but also the most extensive deployment of AI ever seen in sports. How the Adidas Trionda smart ball works The official match ball named Adidas Trionda is equipped with an Inertial Measurement Unit IMU sensor operating at 500Hz, which means it collects 500 data points every second on movement, spin, and the exact moment the ball makes contact with a player foot. This is particularly important for offside situations, as the sensor will determine the precise moment the ball leaves the passer foot down to the millisecond. The timestamp from the sensor is synchronized immediately with the player tracking system, helping to lock the position of every player on the pitch at that exact moment instead of relying on the naked eye which can be off by up to half a second. As a result, offside decisions are made faster and more accurately than ever before. This advanced technology immediately rescued the Swedish team by identifying the precise moment of contact from striker Alexander Isak. Before that, the joy of scorer Svanberg was temporarily dampened when the VAR team stepped in to review. In a play that occurred at a breakneck speed, he appeared to be standing behind the Tunisian defense when the ball was delivered into the penalty area, leading many to believe the goal would be disallowed. However, the data from the motion sensor mounted inside the Adidas Trionda ball proved that Svanberg moved back to a valid position in time, bringing a legitimate goal for Sweden to the delight of the fans. Semi automated offside technology with 3D player avatars Semi automated offside technology SAOT has been upgraded significantly for World Cup 2026, highlighted by the 3D avatar of each player. Every player participating in the tournament is digitally scanned across the entire body in about one second, creating a 3D model with detailed body dimensions for every part. When a situation requires VAR review, the system overlays these 3D models onto real time tracking data from more than 12 specialized cameras at each stadium. This approach completely resolves the long standing issue of two dimensional offside lines, where a player arm, shoulder, or foot might be obscured from a certain camera angle. The 3D model fills those gaps using realistic anatomical data, and the result is displayed as a complete 3D animation on the pitch and on television, entirely replacing the flat red and green lines that once confused spectators. Football AI Pro: analytics platform for all 48 teams FIFA collaborated with Lenovo to build Football AI Pro, an analytics platform developed on the FIFA Football Language foundation model, which has been trained on hundreds of millions of football data points over decades of competition. This is the first time in World Cup history that all 48 participating teams have access to the same analytics platform, rather than wealthier federations holding an advantage due to better data tools. This platform outputs results in multiple formats, including text summaries, video clips, interactive charts, and 3D tactical visualizations. Teams can use it before and after matches to analyze opponent tactics, detect set piece patterns, track player workload intensity, and analyze head to head history. However, FIFA bans its use during match time, and coaching staff can only access it during halftime and after the match. Referee chest cameras with AI image stabilization For the first time in history, referees in all 104 World Cup matches wear chest cameras. The raw images from the camera when the referee runs at high speeds are shaky and cannot be used for broadcasting, but FIFA runs an AI image stabilization model in real time on every frame, creating broadcast quality video. The result is the Referee View perspective that offers a subjective experience from the pitch, quickly becoming one of the most popular broadcasting innovations. This viewpoint not only serves entertainment but also provides analysts with a new data source, which is the exact vision that the referee had when making decisions. Google Gemini on the touchline and fan experience In March 2026, the Argentine Football Association announced Google as an official global sponsor, with the Gemini logo appearing on training jerseys for the men, women, and youth teams. However, this partnership goes far beyond brand advertising, because the Argentina technical staff uses Gemini directly for tactical analysis from match videos, tracking player workload and injury recovery, querying historical data on specific matchup scenarios, and creating individual opponent briefings for each player. Notably, Argentina players and coaches use Gemini through the standard application rather than any customized interface, reflecting the maturity of general purpose AI tools in professional sports applications. Additionally, Google also deployed a series of features for fans, including live scores pinned to the Android lock screen, AI match summaries on the Gemini app, on demand tactical diagrams, jersey templates on Google Photos, stadium navigation via Google Maps, and match statistics on Google Search. Robot dogs, facial recognition, and AI security At the host venues, FIFA deployed Boston Dynamics Spot robot dogs for outer perimeter security patrols and facility inspections. These robots perform automated patrols in restricted areas, with onboard cameras connected to the stadium security AI system, which is particularly effective in spaces that are difficult to monitor continuously, such as tunnels, underground technical corridors, and stadium perimeters at night. The biometric layer is equally notable, as some stadiums use facial recognition for entry, where your face is your ticket, processed against the database in less than one second. However, the widespread presence of AI surveillance also raises questions about privacy in large scale sporting events. AI predictions for the champion: every model has a different answer Before the tournament kicked off, many AI systems simulated all 104 matches to predict the champion, and the results were completely inconsistent. ChatGPT predicted Spain, the FanDuel research model chose France to defeat Argentina 3 to 2 in the final, while Yahoo Sports and DataCamp both bet on Brazil. This disagreement is worth reflecting on, as every model was provided with the same public data sources including FIFA rankings, ELO scores, qualifying form, and injury reports, but different weighting methods created entirely different results. And of course, no model can calculate Messi left foot shot in the 89th minute of a knockout match. That is still football. AI is no longer an experiment but infrastructure What makes World Cup 2026 different from previous tournaments does not lie in any single technology, but in the fact that AI has transitioned from the experimental phase to operational infrastructure. The smart ball, the 3D offside system, the referee cameras, and the analytics platform are not pilot projects. They are the basic operational foundation for every match. The 500Hz sensor inside the ball does not understand football, as it only measures spin. However, the decision it enables, accurate to the millimeter, displayed in 3D, and returning results in seconds, with the Swedish team situation being a prime example, will change how football is operated. That is the true shape of AI when running at a large scale.

Nam•

16 Jun, 2026

Anthropic launches the highly powerful Claude Fable 5 model

Anthropic just dropped what may be its biggest release yet with Claude Fable 5, and it has quickly become the most talked-about model this week. Not just because of its raw power, but because of how Anthropic brought it to the world: this is the first time a Mythos-class model has been made available to general users, after two months under lock and key for safety reasons. What is Fable 5 and why is it different from previous models? At its core, Fable 5 is not a model built from scratch. It is a "safety-hardened" version of Mythos 5, the most powerful model Anthropic has ever built. Back in April 2026, Mythos Preview was only accessible to a very small group of organizations including AWS, Apple, Google, Cisco, and JPMorgan Chase through Project Glasswing, because its ability to detect and exploit software vulnerabilities was simply too powerful to release broadly. Anthropic had also launched Claude Opus 4.8 beforehand as a stepping stone in the development roadmap toward this new model generation. To get Mythos out the door, Anthropic spent two more months building classifiers running in parallel. These are specialized AI systems that analyze requests before the main model processes them, and when a sensitive topic is detected, the system automatically routes to Claude Opus 4.8 at no additional charge. Anthropic says this mechanism only activates in fewer than 5% of sessions, meaning most general users will notice no difference compared to raw Mythos 5. Fable 5 and Mythos 5 share the same pricing: $10 per million input tokens and $50 per million output tokens, which is less than half the cost of Mythos Preview. Users on Pro, Max, Team, and Enterprise plans can use Fable 5 for free through June 22, 2026. Starting June 23, Anthropic will shift to consumption-based billing until infrastructure capacity allows the model to return to fixed subscription plans. How does Fable 5 differ from Mythos 5 on safety? Despite sharing the same underlying model, Fable 5 and Mythos 5 are two distinct products by design. The difference lies entirely in the safety classifiers layered on top of the base model. Three classifiers Fable 5 has that Mythos 5 does not Fable 5 is equipped with three safety classification layers running alongside the main model, covering: Cybersecurity, Biology and Chemistry, and Distillation. When a user submits a request in any of these areas, Fable 5 automatically falls back to Claude Opus 4.8 instead of the main model, and notifies the user accordingly. Mythos 5 has none of these filters. It retains the full software exploitation and biological research capabilities that Anthropic considers too dangerous for wide distribution, which is why Mythos 5 remains restricted to a limited group within Project Glasswing, including vetted cybersecurity professionals, critical infrastructure organizations, and approved biology researchers. How does this affect real-world performance? The classifier difference leads to meaningfully different benchmark results in specialized tasks. On ExploitBench, a benchmark focused on cybersecurity, Mythos 5 scores 78% while Fable 5 lands near the 40% range of Opus 4.8, because the fallback mechanism triggers as soon as it detects attack-related requests. For scientific research, Mythos 5 can design proteins and generate novel hypotheses at roughly 10 times the speed of previous methods, while those same capabilities are restricted in Fable 5 for safety reasons. If you are a researcher or work in legitimate cybersecurity, be aware that Fable 5 may automatically redirect some of your requests to Opus 4.8, even when the context is entirely valid. Anthropic acknowledges this and is actively working to improve classifier accuracy. Real-world performance: what do the numbers say? On SWE-Bench Pro for coding tasks, Fable 5 scores 80.3%, compared to 69.2% for Opus 4.8 and 58.6% for GPT-5.5. But perhaps the more striking number comes from a real deployment: Stripe used Fable 5 to migrate an entire 50-million-line Ruby codebase in a single day, a task that would have taken a full engineering team more than two months to complete manually. On business analytics, Fable 5 is the first model to cross the 90% threshold on Hex's complex analytics benchmark, outperforming Opus 4.8 by 10 percentage points. IMC, a quantitative trading firm, reported that the model scored near-perfect on their internal evaluation covering fact lookup, causal reasoning, and expected value calculations. The biggest shift from previous models is the ability to sustain focus across multi-day tasks without needing human oversight at every step. Rather than executing commands one at a time, Fable 5 can take on a large project, self-plan, run tests, and handle errors in a loop, behaving far more like an engineer than a question-answering tool. Fable 5 is now available on the Claude API under the model ID claude-fable-5, with support on Amazon Bedrock and Google Vertex AI for enterprise consumption-based plans. Notion integrates Fable 5: from scattered notes to a complete action plan Notion is one of the first applications to integrate Fable 5, and the reason is straightforward. The tasks Fable 5 handles best, specifically reading multiple fragmented data sources, synthesizing them, and producing a logical structure, are exactly what Notion users need most in their daily work. Simon Last, co-founder of Notion, described the primary use case as turning messy meeting notes into a task board with assignments and priorities. Instead of users having to re-read entire transcripts, summarize, and manually create tasks, Fable 5 handles the entire chain without needing to be prompted at each step. There has been no official announcement from Notion about Fable 5 pricing after June 22. It remains to be seen whether Notion AI will pass the consumption cost directly to users or absorb it into existing subscription tiers. If the rate ends up lower than going directly through Anthropic, that would be a meaningful advantage for Notion subscribers. A few things to keep in mind before diving in Fable 5 is powerful, but there are two things worth considering before building it into your workflow. First, the $50 per million output tokens price point is high relative to the current market, making it well-suited for complex engineering or analytical tasks but not necessarily for simpler jobs that Sonnet or Haiku can handle at a fraction of the cost. Second, the safety classifiers work well in the vast majority of cases but can trigger incorrectly in some legitimate research contexts, something Anthropic openly acknowledges and is continuing to refine. For individual users on Pro or Max plans, the remaining days before June 22 are a reasonable window to evaluate whether Fable 5 actually generates enough value at that price point before committing to pay-per-use billing.

Nam•

10 Jun, 2026

YC CEO's 6 forcing questions before starting any project

I'd heard a lot about the gstack repo from the CEO of Y Combinator, so I got curious and installed it to try. What surprised me most wasn't the polished workflows — it was the genuinely different mindset behind them. That mindset shows up in the very first command: /office-hours, with six questions that don't ask about code at all, only the things most people haven't thought through before they start building. What is gstack and why did Garry Tan build it gstack is an open-source toolkit by Garry Tan, CEO of Y Combinator, built primarily for Claude Code. The core idea: instead of using AI as a plain code writer, Garry Tan wanted to turn Claude into a small AI agent team, where each member handles a different role — from product direction and security review to testing and release. The entire workflow runs in an ordered loop: Think → Plan → Build → Review → Test → Ship → Reflect. More specifically, gstack splits Claude Code into 23 specialized roles, and the output of each step is automatically passed to the next — no manual handoff needed. Some of the standout commands: /office-hours 6 questions that force you to rethink your feature before writing a single line of code /plan-ceo-review checks whether you're overbuilding or underbuilding relative to what's actually needed /review catches serious bugs that standard automated checks miss /qa opens a real browser, performs real interactions, finds real bugs /cso runs an automated security audit against international standards /ship syncs, tests, pushes code and opens a pull request in a single command How effective is gstack? Garry Tan says his working speed in 2026 is roughly 810 times faster than in 2013, measured by lines of completed code per day (11,417 vs 14). In 60 days, he shipped 3 production services and over 40 features — all while running Y Combinator full-time. Andrej Karpathy, co-founder of OpenAI, confirmed a similar trend, sharing that he hasn't typed a single line of code himself since December 2025 thanks to AI agents. But among all those commands, /office-hours stands out for the opposite reason from the rest, it doesn't help you work faster and it helps you avoid building the wrong thing from the start. Why Garry Tan puts /office-hours first Garry Tan placed /office-hours at the top of the workflow based on a simple observation: most products fail not because of poor code, but because they build the wrong thing. Teams spend weeks on a feature nobody needs, or build the right feature for the wrong audience, or solve a problem users already handle better another way. The command has two modes: Startup mode for founders and people building real products with real users, and Builder mode for side projects, hackathons, and open source. This article focuses on Startup mode, where the 6 questions are most directly applicable. 6 questions that stop you from building the wrong thing These aren't 6 questions to answer quickly and move on. They're designed to make you think honestly, because the more truthful your answers, the more accurately Claude can match what you actually need — saving you a significant amount of time later. You can read the full original prompts at office-hours/SKILL.md.tmpl. Demand reality: Is there a real need? Original question: "Who specifically has this problem? How are they solving it today?" Not "users in general" or "the marketing team" — the goal is to name one real person, ideally by name, who is actively struggling with a specific problem. If you can't name someone like that, you don't yet understand what they actually need. Concrete example: Instead of "users want better task management," it should be: "Minh, a project manager at a 20-person company, copy-pastes between Notion and Google Sheets every Monday morning because the two tools don't sync." Apply this to your own situation accordingly. Status quo: What are they using instead? Original question: "What is their current workaround? How much better do you need to be for them to switch?" Everyone is already solving their problem somehow — whether with Excel, sticky notes, or a WhatsApp group. If their current solution is good enough, they have no reason to migrate their data and learn an entirely new platform. Your solution needs to be meaningfully better before they'll even consider switching. Desperate specificity: Who needs this badly enough? Original question: "Who needs a solution badly enough to use your ugly beta version today?" This is the question that separates nice-to-have from must-have. If you can't find anyone willing to use an incomplete, rough, buggy version right now, the problem you're solving isn't urgent enough. Real early users are people who need a solution badly enough to tolerate an unpolished product — as long as it's moving in the right direction. Narrowest wedge: What is the smallest possible piece? Original question: "What is the smallest thing you could launch tomorrow? Not the full vision — the smallest piece." Not the first full-featured version — something even smaller than that. This question typically cuts 80% of the scope people add because they think "might as well do it while I'm here." It's a trap many builders fall into, including myself. Launch the smallest meaningful piece first, listen to real users, then decide whether to expand. Common mistake: Many people confuse "smallest piece" with "first full-featured version." The narrowest wedge truly means one small thing that solves one specific problem for one specific group of users — nothing more. Observation and surprise: Have you watched real people use it? Original question: "Have you watched real people use your product? Did they use it in ways you didn't expect?" This question is best saved for the second iteration onward, once you have something to test. Rather than asking for feedback through messages or surveys, sit and watch directly — or review screen recordings. The most valuable insights usually don't come from what users say, but from what they do that you didn't design for, or what they skip that you thought was important. Note: If you're in your first iteration and don't have a product yet, you can skip this question and come back after launching the smallest piece in step 4. Future-fit: The 2 to 3 year view Original question: "In 2-3 years, will what you're building still be relevant — or is the trend moving against you?" This isn't about predicting the future precisely. It's about avoiding building something that's already fading. If the trend is making your problem less urgent over the next two years, that's a clear signal to reconsider from the start. That said, if your goal is to move fast and capture the market before big tech ships something similar, this question can reasonably be set aside. A real example: a simple idea completely flipped In the gstack documentation, Garry Tan walks through a practical example. You open /office-hours and say: "I want to build an app that summarizes my daily work calendar." Claude doesn't agree and start executing. Instead, it pushes back: what you just described isn't a calendar summary app — it's actually a full personal AI chief of staff. These are entirely different in scope, technical complexity, and user expectations. From that single opening description, /office-hours helps you see: 5 features you were describing without realizing it 4 assumptions that need to be validated before building 3 different implementation directions with varying levels of complexity 1 recommendation: launch the smallest piece first, treat the rest as a long-term roadmap All of this happens before you write a single line of code. The output is saved as a document that subsequent steps in the workflow automatically pick up and continue from. These 6 questions work even without gstack The 6 questions from /office-hours don't require Claude Code or a gstack installation. They're a way of thinking — the same framework YC partners use to evaluate startups — and you can apply them right now with any AI tool you already have. The difference when using them through gstack is that Claude won't let you give vague answers. It pushes for specifics and won't move forward until your response is grounded enough to be useful. That's why /office-hours tends to be the most uncomfortable command in the entire toolkit — not because it's difficult to use, but because it asks exactly what you've been avoiding. Try it today: Before starting your next project, paste these 6 questions into Claude, Gemini, or ChatGPT along with your idea. Ask it to go through each question one at a time and not let you skip any. The results are often more surprising than you'd expect — even for ideas you've already thought through carefully. gstack currently has over 117k stars on GitHub and is still growing. For me, the most valuable part isn't the technical commands like /review or /ship — it's /office-hours, because it's the only command in the entire toolkit that forces you to stop and think before doing anything else.

Nam•

27 Jun, 2026

How to control Codex from your phone with ChatGPT app

You're out and suddenly remember a small detail in your project that needs fixing — you don't have to open your laptop or remote desktop in. With the right connection set up, ChatGPT app on your phone can become a control panel for Codex, while your computer at home or the office keeps running the actual code. ChatGPT app doesn't run Codex on your phone The easiest thing to misunderstand is thinking Codex is running directly on your phone. In reality, your phone only sends prompts, replies, approvals and follow-up messages, while the actual working environment lives on your Mac or Windows machine running Codex. In other words, ChatGPT app is the remote controller, and the host machine is where your repo, terminal, credentials, plugins, MCP servers and other tools actually live. This makes complete sense because codebases typically live on your development machine, not your phone. When you send a request like fixing a TypeScript error, running tests or checking a diff, Codex processes it inside the selected project on the host and sends results back for you to review. If you want to understand the foundation before using remote access, check out What is Codex and how to use Codex to get a clear picture of where this tool fits in your workflow. What do you need before connecting ChatGPT app to Codex? According to the latest Codex documentation from OpenAI, ChatGPT app supports controlling Codex on both macOS and Windows, though Linux is not supported yet. Notably, this feature works with all ChatGPT account types, including Free and Go — no paid plan required. You only need to make sure you're signed into the same account or workspace on both devices: ChatGPT mobile (latest version on iOS or Android) and Codex (latest version on your host machine, online and running). Your host machine must stay on and Codex must keep running for the entire time you're controlling it remotely. If the machine goes to sleep, loses its connection or Codex is closed, the connection from your phone drops immediately and any tasks in progress may be interrupted. What's worth noting is that the entire setup process starts from Codex App on the host machine and is surprisingly simple — just scan a QR code and you're done. Inside Codex App, select the mobile setup option in the sidebar, scan the QR code with your phone, then complete the confirmation in ChatGPT app. For enterprise workspaces, an admin may need to enable Remote Control permissions before you can connect. This QR code grants control over your computer, so keep it private and never share it with anyone to avoid unauthorized access to your machine. To summarize, connecting ChatGPT app to Codex is straightforward: Host machine must be online and running Codex ChatGPT app and Codex must be signed into the same account or workspace Generate the QR code in Codex on the host and complete setup on your phone MFA, SSO or passkey requirements may still apply depending on your workspace What can you do once connected? Once the host appears in Codex on your phone, you can start a new thread inside a project on the host or pick up an existing one. This is where the experience becomes genuinely useful: you can send follow-ups, answer Codex's questions, approve commands, view output, check diffs, review test results and even receive notifications when a task finishes or needs your attention. A real example: you're at a coffee shop and remember the login form has a validation bug. You open ChatGPT app, select the connected host, and ask Codex to check the auth flow, fix the email validation error and run the related tests. Codex works directly on the repo sitting on your host machine, while you review the results, approve actions when needed and decide whether to request further changes. This is also why people are starting to think of Codex and other AI-powered IDEs as a colleague working inside a real environment, not just a code suggestion tool anymore. Its strength lies in reading files, running commands, editing code and maintaining context across multiple rounds of back-and-forth. Limitations to keep in mind when using Codex from your phone Remote control depends entirely on the host machine — if your computer goes to sleep, loses its connection, closes Codex or gets signed out of the workspace, your phone loses its working environment immediately. That said, if Codex is mid-task when the connection drops, it will continue running on the host and notify you once your phone reconnects, so there's less to worry about if your phone suddenly loses signal during a running task. One more thing to note: on Windows, tasks using Computer Use require an appropriate foreground session, so this setup is not a complete replacement for sitting directly in front of your machine. It also helps to draw a clear line between handing off a focused task and reviewing large changes. Your phone works well for small bugs, running tests, quick questions about a specific file, reviewing short tasks or checking task status. However, anything requiring a high level of attention should still be reviewed on a larger screen to avoid missing details. How to use it effectively in practice The most effective approach is to hand off tasks with a clear scope and specific expected outcomes. Instead of saying "fix the login", describe exactly where the error occurs, what the expected behavior should be after the fix, which tests to run and which parts of the codebase to leave untouched. Codex performs better when it knows the boundaries of a task, especially since remote mobile means each feedback loop takes longer than when you're sitting right at your machine. A clean working rhythm might look like this: describe the task in detail whether small or medium-sized, ask Codex to read the relevant files, let it propose a solution, only approve when necessary and wait for the result report. Once you get used to this rhythm, you'll find that idle time outside can handle real work — while keeping the final decision firmly in your hands. Compared to Claude Code Remote and Telegram bot There are many ways to control an AI coding agent from your phone, though the three most common approaches each serve a different need. Criteria ChatGPT app + Codex Claude Code Remote Telegram + Codex Natural conversation ✅ Excellent ✅ Good ❌ Requires exact syntax Granular control Moderate Highest Low Connection stability Stable Stable Frequent drops Mobile UI Well optimized Not fully optimized Uses existing Telegram app Initial setup Easy, scan QR Easy Requires manual bot configuration Computer must stay on ✅ Required ✅ Required ✅ Required Claude Code Remote Control offers the strongest level of control — you get direct terminal output, can intervene mid-task and generally feel much closer to what the agent is doing. That said, the UI on small phone screens isn't fully optimized yet, and some interactions are still difficult to perform without a physical keyboard. Telegram bot has the advantage of not requiring a separate app and is easy to get started with, but the real-world experience has clear limits: it's prone to slowdowns, occasional silent disconnections mid-task, and because it lacks genuine AI context, anything slightly more complex than a simple command quickly falls apart — forcing you to type precise instructions rather than describe what you need naturally. ChatGPT app + Codex sits at the best balance point for most users — smooth enough, smart enough, quick to set up with a QR scan and no new syntax to learn before you can get to work. Connecting ChatGPT app to Codex doesn't turn your phone into a development machine — it turns your phone into a control surface for a development machine that's already ready to work. As long as the host stays on, permissions are configured correctly and the task is scoped tightly enough, this is the most practical way to handle real coding work when you're away from your laptop.

Nam•

22 Jun, 2026

What Is Hermes Agent? Nous Research's Self-Learning AI

Learning more makes you better, a principle long assumed to apply only to humans, turns out to hold true for Hermes Agent too, an open-source AI agent from Nous Research. Every time you work with it, Hermes Agent doesn't forget, it remembers, understands you more deeply, and gets better with each session, thanks to a memory system that can recall everything about you even after the machine has been off for a week. What Is Hermes Agent? Hermes Agent is an open-source AI agent developed and released under the MIT license by Nous Research, the lab behind the Hermes, Nomos, and Psyche model lines. Unlike Antigravity or Codex, which depend on an IDE environment, or ordinary chatbots that ultimately remain a thin wrapper calling a single API, Hermes Agent is built to run continuously on a user's own infrastructure, from a cheap VPS to a GPU cluster or serverless infrastructure, and it operates in a way fairly similar to OpenClaw. The core difference in Hermes Agent lies in how it manages long-term memory and converts experience into real skills. Instead of merely storing raw information or passively remembering preferences the way AI like Gemini or Claude do, Hermes runs a closed "learning loop," meaning that after every work session, it actively distills the process into new tools it can use the next time. This system is run by a background "Curator" agent that automatically scores, prunes, and merges accumulated knowledge, combined with FTS5 search technology that retrieves old memories roughly 4,500 times faster without spending any tokens. As a result, Hermes doesn't just respond and forget, it genuinely becomes a collaborator that grows more knowledgeable and capable over time. Four Features That Set Hermes Agent Apart Nous Research doesn't call Hermes Agent a chatbot or a copilot, it positions it as an agent with a built-in learning loop. The four feature groups below explain why that label isn't just marketing. Memory That Persists Across Sessions The biggest weakness of most AI today is that memory only stores raw chat text rather than how work actually gets done. Hermes Agent addresses this through three combined mechanisms: Fast retrieval: Uses FTS5 full-text search to pull up old memories roughly 4,500 times faster than conventional search, without spending extra tokens the way Gemini or Cowork do. User understanding: Integrates Honcho's dialectical user-modeling approach, helping the agent understand preferences, habits, and personal context in depth across thousands of sessions. Continuity: The agent picks up work exactly where you left off, even if that was a project from weeks earlier. Self-Generating and Self-Improving Skills This is the feature that makes Hermes Agent behave like a collaborator that accumulates experience, rather than just a tool that answers on request: Learning from real use: After completing complex tasks, Hermes Agent distills the process into new skills and stores them in a library to be reused automatically next time. Open agentskills.io standard: These skills follow an open standard, so they can be packaged, shared, and reused across different AI systems without being rewritten from scratch. The Curator mechanism: A background administrative agent periodically scores, prunes, and merges duplicate skills, which keeps the skill library from bloating and becoming disorganized over time. Present on More Than 23 Messaging Platforms Hermes Agent isn't confined to a computer, it integrates directly into the messaging channels people already use on their phones every day: Multiple channels, one brain: You can command Hermes Agent through Telegram, Discord, Slack, WhatsApp, Signal, email, or SMS. Context retained: Whether you message via Telegram in the morning or switch to Discord at night, the agent keeps a single thread of memory, never fragmented by channel. Multimodal interaction: Supports sending voice messages, images, and video, along with the ability to analyze multimodal content. Flexible Runtime Infrastructure Hermes Agent supports six backend types for executing commands: local machine, Docker, SSH, Daytona, Singularity, and Modal. With Daytona and Modal, the environment can hibernate when idle and cost almost nothing while waiting, waking up only when there's work to process. This is why Nous Research describes Hermes Agent as an always-on agent that doesn't require users to keep a server running 24/7 at high cost year-round. Hermes Agent can be installed with a single curl command, supporting Linux, macOS, and Windows via WSL2, or, as of June 5, 2026 with version v0.16.0 "The Surface Release," through an official Native Desktop app for Windows, macOS, and Linux with a fully polished GUI, making it accessible to everyday users without needing a terminal. Built-In Toolset and Limitations to Know 40-Plus Built-In Tools, From Web Search to Schedule Automation Hermes Agent ships with more than 40 built-in tools, including web search, browser actions, file handling, and Python script execution via RPC to run sub-tasks without consuming the main agent's context window. A natural-language scheduling system lets you set recurring tasks like daily reports or data backups, then leaves the agent to run them without being reminded. For tasks that need full isolation, Hermes Agent also supports sub-agents with their own conversation, terminal, and scripts, allowing multiple jobs to run in parallel without diluting the main memory. Challenges and Security Considerations Despite rapid updates, Hermes Agent still has a few points users should keep in mind before deploying it: Stability of the self-learning mechanism: The ability to self-improve skills boosts success rates, with a Tencent Cloud report recording gains of up to 52% along with token savings of up to 61%. However, since this is a self-evolving mechanism, real-world effectiveness still depends on the underlying model chosen and still requires human oversight rather than full trust. Risk from high-level permissions, with security responsibility falling on the user: Hermes Agent can intervene deeply in a system (excessive agency), so connecting it directly to multiple messaging platforms requires users to manage their own API keys and set up guardrails. Unlike closed AI services, Hermes Agent hands full control over to the user, which means the user also bears greater responsibility for configuring access permissions to avoid information leaks. Why Is Hermes Agent Growing So Fast? Hermes Agent's growth could be attributed to Nous Research's marketing, but in our view it comes down to three main factors. A Frictionless Migration Path From OpenClaw Recognizing OpenClaw's large user base, Nous Research built a migration tool that lets users carry over their persona, API keys, the entire skill set, and memory to Hermes Agent with a single command, without losing old data and, of course, without having to reconfigure anything from scratch. If you're currently using OpenClaw and want to try Hermes Agent without losing your old data, look for the hermes claw migrate migration tool built into Hermes Agent before considering a fresh install. Betting on a Closed Learning Loop Instead of a Feature Race While many other agents compete on the number of tools they offer, Hermes Agent positions itself as a self-evolving entity, one that distills experience into new skills and retains long-term memory to understand users more deeply over time. This approach creates lasting value, and the community has already put it to use for projects such as automating large-scale content production with high consistency across many sessions. A Role as a Training Data Engine Beyond serving as a personal assistant, Hermes Agent also functions as a capable research tool. It can generate thousands of parallel tool-calling trajectories and compress them into training data for other AI models. By turning the agent's real-world experience into training data, Hermes becomes a platform that developers building the next generation of autonomous AI can't easily do without. How Is Hermes Agent Different From an Agent Harness? People new to the space often confuse Hermes Agent with the concept of an agent harness, which is the framework that decides how a model calls tools, handles the reasoning loop, and coordinates execution steps internally. If a harness is the engine and chassis that determine how a car drives, then Hermes Agent is like a car that already has that engine installed, plus seats, a navigation system, and the driver's own trip memory. In other words, a harness is the technical architecture layer underneath, while Hermes Agent is a complete end-user product that already packages memory, a skill system, communication channels, and a choice of runtime infrastructure. A developer can build their own harness to control every small detail, but most users don't need to go that deep, they just need an agent that runs right away and gets smarter through use. For a closer look at this underlying architecture layer, read more at What Is Agent Harness? The Framework That Makes AI Work Efficiently, which explains in detail how this type of framework operates. Is Hermes Agent Worth Trying Right Now? Being fully open source, collecting no user data, and supporting complete self-hosting, Hermes Agent is one of the few agents today that lets users keep full control over their own data while still getting a continuous assistant experience with real memory, not the simulated memory that only exists within a single chat. After v0.16.0, the biggest technical barrier for users unfamiliar with terminals has largely been removed, as the native desktop app for Windows, macOS, and Linux has fully replaced the pure CLI approach used before. What's left to judge about Hermes Agent isn't whether it runs, but what it learns after a few real weeks of use. The fastest way to find out is to install the desktop app or run the CLI on a cheap VPS, connect it to a familiar messaging channel like Telegram, then watch what skills the agent forms on its own from how you use it every day. That's also the groundwork for comparing Hermes Agent with other options on the market, from Agent Harness to OpenClaw and Claude Cowork, in the next part of this series.

Nam•

19 Jun, 2026

Gemini powers Argentina and Messi at World Cup 2026

Gemini has won big in the most literal sense, right as Messi scored his first hat-trick at the 2026 World Cup, leading Argentina to a crushing 3-0 victory over Algeria and equaling Miroslav Klose's record of 16 World Cup goals. That historic moment became the perfect launchpad for Gemini. Back in March 2026, Google and the Argentine Football Association (AFA) made a bold decision: rather than simply printing a logo on training kits, they signed a deal for the AI to actively support tactical preparation and professional decision-making. That bet has now proven to be the right call. From training kit to the tactical meeting room The agreement between AFA and Google was unveiled at Times Square, New York, a venue deliberately chosen to capture global media attention. The Gemini logo appears across all training apparel for Argentina's men's, women's and youth squads, sitting alongside Adidas and American Express in AFA's top sponsorship tier. But the interesting part isn't the jersey. According to Inside World Football, Argentina's coaching staff will use Gemini for three specific purposes: tactical analysis, injury prevention and decision support. In other words, Gemini now has a seat in meetings that previously belonged only to Scaloni and his assistants. Google has not publicly disclosed which specific Gemini tools have been integrated into AFA's workflow. What is clear is that they are using the World Cup to bring Gemini into the reality of professional football, and the results will be graded in public. What is Gemini actually doing in the dressing room? Argentina arrives at the 2026 World Cup as the reigning champion. Every decision Scaloni makes, from the squad list to the starting eleven, is scrutinized more closely than any other team, and that is precisely why Argentina has become the most ideal testing ground Google has ever had for Gemini in professional football, especially at a major tournament. Tactical analysis Gemini is used to process match data for both Argentina and their opponents, covering movement statistics, attacking patterns and defensive vulnerabilities. Instead of the coaching staff spending hours reviewing footage, AI synthesizes the data and generates tactical diagrams automatically, saving significant preparation time before each match. Injury prevention This is a problem every major team wants to solve, especially when Messi and several key players are at an age that requires careful management of training loads. Gemini analyzes biometric data and injury history to issue early warnings, helping the coaching staff adjust intensity before problems actually occur. That is part of the reason why, immediately after completing his hat-trick, Scaloni chose to substitute Messi off, prioritizing fitness and safety for the matches ahead. AI in injury prevention is nothing new. Premier League clubs have had Microsoft as a partner for similar purposes. What is different this time is that Gemini is integrated directly into the workflow of a national team competing at a major tournament, not just at club level. For fans: create Messi content, follow scores without unlocking your screen Alongside supporting the coaching staff, Gemini has also rolled out a range of features aimed at fans, and this is the side that hundreds of millions of people will actually experience. Gemini lets you create content about players directly Users can generate images, songs and digital content featuring Argentina players like Messi directly inside the Gemini app. The feature is designed to bring the World Cup experience closer to those who cannot attend matches in person. Real-time scores and automated daily briefings On Google Search, live match scores can be pinned to the lock screen and update in real time, with dedicated animations for goals and red cards, all without needing to unlock the phone. For paid Gemini users, the Scheduled Actions feature allows an automated daily football briefing to be set up, covering scores, news and fixtures, delivered at a chosen time without needing to prompt it each day. Match-day infrastructure Google has updated Street View at all 16 host stadiums and optimized routing on Waze for match days. Waze also surfaces live scores when the car is stopped at red lights, so drivers do not need to pick up their phones while on the move. The 2026 World Cup is the real test for AI in sport Google is not sponsoring Argentina alone. Gemini also appears on the kits of France, Morocco, Iraq, Turkey and the United States, while Pixel is the official phone of the French squad, which is also using Gemini for internal communications. This is clearly a comprehensive strategy from Google, not a one-off deal. What makes the 2026 World Cup particularly significant is that it will answer a question no lab environment can: what do users actually do with AI when a World Cup runs for six weeks across 104 matches? Features that run on initial novelty will fade after the group stage. Whatever users keep coming back to all the way through the final is the honest answer to where AI actually fits in everyday life, and Google knows it. Google's communications director for Latin America, Flor Sabatini, stated that the 2026 World Cup will mark a before and after in the history of football because of AI. It sounds like marketing, but the reality is that this is the first time a major AI model has been integrated into the preparation of the reigning world champions, right in the middle of the most-watched sporting event on the planet. The 2026 World Cup is Gemini's real test The most significant part of this entire story is not the Gemini logo on Messi's jersey. It is the fact that Argentina, still the most expected to win and the most scrutinized team, carrying the pressure of defending the title, has committed part of its preparation process to AI. If Argentina succeeds, Gemini will have a case study that no advertising budget can buy. If Argentina falls short and the coaching staff attributes any part of it to AI, the narrative will flip entirely. Either way, this is the first time AI has been held accountable on a stage that genuinely matters, not a benchmark, not a demo, but the World Cup. For AI users, what is worth watching is not just whether Argentina wins, but whether Gemini actually changes how a football team operates, or whether it turns out to be nothing more than a logo on a training kit that looks better than previous years.

Nam•

17 Jun, 2026

AI Technology at World Cup 2026: A Complete Overview

The Adidas Trionda match ball, three dimensional player models accurate to the millimeter, robot dogs patrolling stadiums, and Google Gemini sitting on the touchline with the Argentina national team. World Cup 2026 is not only the largest tournament in history with 104 matches across 16 cities in the United States, Canada, and Mexico, but also the most extensive deployment of AI ever seen in sports. How the Adidas Trionda smart ball works The official match ball named Adidas Trionda is equipped with an Inertial Measurement Unit IMU sensor operating at 500Hz, which means it collects 500 data points every second on movement, spin, and the exact moment the ball makes contact with a player foot. This is particularly important for offside situations, as the sensor will determine the precise moment the ball leaves the passer foot down to the millisecond. The timestamp from the sensor is synchronized immediately with the player tracking system, helping to lock the position of every player on the pitch at that exact moment instead of relying on the naked eye which can be off by up to half a second. As a result, offside decisions are made faster and more accurately than ever before. This advanced technology immediately rescued the Swedish team by identifying the precise moment of contact from striker Alexander Isak. Before that, the joy of scorer Svanberg was temporarily dampened when the VAR team stepped in to review. In a play that occurred at a breakneck speed, he appeared to be standing behind the Tunisian defense when the ball was delivered into the penalty area, leading many to believe the goal would be disallowed. However, the data from the motion sensor mounted inside the Adidas Trionda ball proved that Svanberg moved back to a valid position in time, bringing a legitimate goal for Sweden to the delight of the fans. Semi automated offside technology with 3D player avatars Semi automated offside technology SAOT has been upgraded significantly for World Cup 2026, highlighted by the 3D avatar of each player. Every player participating in the tournament is digitally scanned across the entire body in about one second, creating a 3D model with detailed body dimensions for every part. When a situation requires VAR review, the system overlays these 3D models onto real time tracking data from more than 12 specialized cameras at each stadium. This approach completely resolves the long standing issue of two dimensional offside lines, where a player arm, shoulder, or foot might be obscured from a certain camera angle. The 3D model fills those gaps using realistic anatomical data, and the result is displayed as a complete 3D animation on the pitch and on television, entirely replacing the flat red and green lines that once confused spectators. Football AI Pro: analytics platform for all 48 teams FIFA collaborated with Lenovo to build Football AI Pro, an analytics platform developed on the FIFA Football Language foundation model, which has been trained on hundreds of millions of football data points over decades of competition. This is the first time in World Cup history that all 48 participating teams have access to the same analytics platform, rather than wealthier federations holding an advantage due to better data tools. This platform outputs results in multiple formats, including text summaries, video clips, interactive charts, and 3D tactical visualizations. Teams can use it before and after matches to analyze opponent tactics, detect set piece patterns, track player workload intensity, and analyze head to head history. However, FIFA bans its use during match time, and coaching staff can only access it during halftime and after the match. Referee chest cameras with AI image stabilization For the first time in history, referees in all 104 World Cup matches wear chest cameras. The raw images from the camera when the referee runs at high speeds are shaky and cannot be used for broadcasting, but FIFA runs an AI image stabilization model in real time on every frame, creating broadcast quality video. The result is the Referee View perspective that offers a subjective experience from the pitch, quickly becoming one of the most popular broadcasting innovations. This viewpoint not only serves entertainment but also provides analysts with a new data source, which is the exact vision that the referee had when making decisions. Google Gemini on the touchline and fan experience In March 2026, the Argentine Football Association announced Google as an official global sponsor, with the Gemini logo appearing on training jerseys for the men, women, and youth teams. However, this partnership goes far beyond brand advertising, because the Argentina technical staff uses Gemini directly for tactical analysis from match videos, tracking player workload and injury recovery, querying historical data on specific matchup scenarios, and creating individual opponent briefings for each player. Notably, Argentina players and coaches use Gemini through the standard application rather than any customized interface, reflecting the maturity of general purpose AI tools in professional sports applications. Additionally, Google also deployed a series of features for fans, including live scores pinned to the Android lock screen, AI match summaries on the Gemini app, on demand tactical diagrams, jersey templates on Google Photos, stadium navigation via Google Maps, and match statistics on Google Search. Robot dogs, facial recognition, and AI security At the host venues, FIFA deployed Boston Dynamics Spot robot dogs for outer perimeter security patrols and facility inspections. These robots perform automated patrols in restricted areas, with onboard cameras connected to the stadium security AI system, which is particularly effective in spaces that are difficult to monitor continuously, such as tunnels, underground technical corridors, and stadium perimeters at night. The biometric layer is equally notable, as some stadiums use facial recognition for entry, where your face is your ticket, processed against the database in less than one second. However, the widespread presence of AI surveillance also raises questions about privacy in large scale sporting events. AI predictions for the champion: every model has a different answer Before the tournament kicked off, many AI systems simulated all 104 matches to predict the champion, and the results were completely inconsistent. ChatGPT predicted Spain, the FanDuel research model chose France to defeat Argentina 3 to 2 in the final, while Yahoo Sports and DataCamp both bet on Brazil. This disagreement is worth reflecting on, as every model was provided with the same public data sources including FIFA rankings, ELO scores, qualifying form, and injury reports, but different weighting methods created entirely different results. And of course, no model can calculate Messi left foot shot in the 89th minute of a knockout match. That is still football. AI is no longer an experiment but infrastructure What makes World Cup 2026 different from previous tournaments does not lie in any single technology, but in the fact that AI has transitioned from the experimental phase to operational infrastructure. The smart ball, the 3D offside system, the referee cameras, and the analytics platform are not pilot projects. They are the basic operational foundation for every match. The 500Hz sensor inside the ball does not understand football, as it only measures spin. However, the decision it enables, accurate to the millimeter, displayed in 3D, and returning results in seconds, with the Swedish team situation being a prime example, will change how football is operated. That is the true shape of AI when running at a large scale.

Nam•

16 Jun, 2026

Anthropic launches the highly powerful Claude Fable 5 model

Anthropic just dropped what may be its biggest release yet with Claude Fable 5, and it has quickly become the most talked-about model this week. Not just because of its raw power, but because of how Anthropic brought it to the world: this is the first time a Mythos-class model has been made available to general users, after two months under lock and key for safety reasons. What is Fable 5 and why is it different from previous models? At its core, Fable 5 is not a model built from scratch. It is a "safety-hardened" version of Mythos 5, the most powerful model Anthropic has ever built. Back in April 2026, Mythos Preview was only accessible to a very small group of organizations including AWS, Apple, Google, Cisco, and JPMorgan Chase through Project Glasswing, because its ability to detect and exploit software vulnerabilities was simply too powerful to release broadly. Anthropic had also launched Claude Opus 4.8 beforehand as a stepping stone in the development roadmap toward this new model generation. To get Mythos out the door, Anthropic spent two more months building classifiers running in parallel. These are specialized AI systems that analyze requests before the main model processes them, and when a sensitive topic is detected, the system automatically routes to Claude Opus 4.8 at no additional charge. Anthropic says this mechanism only activates in fewer than 5% of sessions, meaning most general users will notice no difference compared to raw Mythos 5. Fable 5 and Mythos 5 share the same pricing: $10 per million input tokens and $50 per million output tokens, which is less than half the cost of Mythos Preview. Users on Pro, Max, Team, and Enterprise plans can use Fable 5 for free through June 22, 2026. Starting June 23, Anthropic will shift to consumption-based billing until infrastructure capacity allows the model to return to fixed subscription plans. How does Fable 5 differ from Mythos 5 on safety? Despite sharing the same underlying model, Fable 5 and Mythos 5 are two distinct products by design. The difference lies entirely in the safety classifiers layered on top of the base model. Three classifiers Fable 5 has that Mythos 5 does not Fable 5 is equipped with three safety classification layers running alongside the main model, covering: Cybersecurity, Biology and Chemistry, and Distillation. When a user submits a request in any of these areas, Fable 5 automatically falls back to Claude Opus 4.8 instead of the main model, and notifies the user accordingly. Mythos 5 has none of these filters. It retains the full software exploitation and biological research capabilities that Anthropic considers too dangerous for wide distribution, which is why Mythos 5 remains restricted to a limited group within Project Glasswing, including vetted cybersecurity professionals, critical infrastructure organizations, and approved biology researchers. How does this affect real-world performance? The classifier difference leads to meaningfully different benchmark results in specialized tasks. On ExploitBench, a benchmark focused on cybersecurity, Mythos 5 scores 78% while Fable 5 lands near the 40% range of Opus 4.8, because the fallback mechanism triggers as soon as it detects attack-related requests. For scientific research, Mythos 5 can design proteins and generate novel hypotheses at roughly 10 times the speed of previous methods, while those same capabilities are restricted in Fable 5 for safety reasons. If you are a researcher or work in legitimate cybersecurity, be aware that Fable 5 may automatically redirect some of your requests to Opus 4.8, even when the context is entirely valid. Anthropic acknowledges this and is actively working to improve classifier accuracy. Real-world performance: what do the numbers say? On SWE-Bench Pro for coding tasks, Fable 5 scores 80.3%, compared to 69.2% for Opus 4.8 and 58.6% for GPT-5.5. But perhaps the more striking number comes from a real deployment: Stripe used Fable 5 to migrate an entire 50-million-line Ruby codebase in a single day, a task that would have taken a full engineering team more than two months to complete manually. On business analytics, Fable 5 is the first model to cross the 90% threshold on Hex's complex analytics benchmark, outperforming Opus 4.8 by 10 percentage points. IMC, a quantitative trading firm, reported that the model scored near-perfect on their internal evaluation covering fact lookup, causal reasoning, and expected value calculations. The biggest shift from previous models is the ability to sustain focus across multi-day tasks without needing human oversight at every step. Rather than executing commands one at a time, Fable 5 can take on a large project, self-plan, run tests, and handle errors in a loop, behaving far more like an engineer than a question-answering tool. Fable 5 is now available on the Claude API under the model ID claude-fable-5, with support on Amazon Bedrock and Google Vertex AI for enterprise consumption-based plans. Notion integrates Fable 5: from scattered notes to a complete action plan Notion is one of the first applications to integrate Fable 5, and the reason is straightforward. The tasks Fable 5 handles best, specifically reading multiple fragmented data sources, synthesizing them, and producing a logical structure, are exactly what Notion users need most in their daily work. Simon Last, co-founder of Notion, described the primary use case as turning messy meeting notes into a task board with assignments and priorities. Instead of users having to re-read entire transcripts, summarize, and manually create tasks, Fable 5 handles the entire chain without needing to be prompted at each step. There has been no official announcement from Notion about Fable 5 pricing after June 22. It remains to be seen whether Notion AI will pass the consumption cost directly to users or absorb it into existing subscription tiers. If the rate ends up lower than going directly through Anthropic, that would be a meaningful advantage for Notion subscribers. A few things to keep in mind before diving in Fable 5 is powerful, but there are two things worth considering before building it into your workflow. First, the $50 per million output tokens price point is high relative to the current market, making it well-suited for complex engineering or analytical tasks but not necessarily for simpler jobs that Sonnet or Haiku can handle at a fraction of the cost. Second, the safety classifiers work well in the vast majority of cases but can trigger incorrectly in some legitimate research contexts, something Anthropic openly acknowledges and is continuing to refine. For individual users on Pro or Max plans, the remaining days before June 22 are a reasonable window to evaluate whether Fable 5 actually generates enough value at that price point before committing to pay-per-use billing.

Nam•

10 Jun, 2026

|

Top AI Tools

Nano Banana Pro

Stitch

ElevenLabs

Kling AI

Now you can work faster and more conveniently with the help of AI.

AI Agents will become increasingly easier to use and access.

Latest AI News

YC CEO's 6 forcing questions before starting any project

How to control Codex from your phone with ChatGPT app

What Is Hermes Agent? Nous Research's Self-Learning AI

Gemini powers Argentina and Messi at World Cup 2026

AI Technology at World Cup 2026: A Complete Overview

Anthropic launches the highly powerful Claude Fable 5 model

YC CEO's 6 forcing questions before starting any project

How to control Codex from your phone with ChatGPT app

What Is Hermes Agent? Nous Research's Self-Learning AI

Gemini powers Argentina and Messi at World Cup 2026

AI Technology at World Cup 2026: A Complete Overview

Anthropic launches the highly powerful Claude Fable 5 model