OpenAI mở cửa AI với GPT-OSS tham gia cuộc đua mã nguồn mở

Published on 13 August, 2025

Quick Summary

OpenAI đã gây bất ngờ lớn khi phát hành hai mô hình mã nguồn mở mới, GPT-OSS-120B và GPT-OSS-20B, dưới giấy phép Apache 2.0, đánh dấu sự tái gia nhập vào 'cuộc đua mô hình mở' sau sáu năm gián đoạn. Các mô hình 'open-weight' này cung cấp hiệu suất mạnh mẽ, các khả năng nâng cao như kiến trúc MoE và suy luận CoT, đồng thời hỗ trợ fine-tune và gọi hàm. Động thái này không chỉ thúc đẩy quyền riêng tư, tiết kiệm chi phí mà còn khuyến khích đổi mới trong cộng đồng AI, mặc dù vẫn còn tranh cãi về định nghĩa 'mã nguồn mở' thực sự.

Có vẻ như đổ vỡ với Microsoft đã khiến OpenAI điều chỉnh đáng kể chiến lược tiếp cận rộng rãi tới người dùng AI khi họ đã công bố phát hành 2 model mã nguồn mở mới là gpt-oss-120b và gpt-oss-20b với kích thước lần lượt là 20 tỷ và 120 tỷ tham số (parameter chứ hoàn toàn không phải neuron).

Đặc biệt là 2 mô hình này đều có mã nguồn mở với giấy phép Apache 2.0 rất tự do. Vậy thì giấy phép Apache 2.0 là gì? Có thể nhiều người vẫn chưa biết về giấy phép mở này thực sự rất dài nhưng tóm gọn lại là với giấy phép Apache 2.0 này người dùng hoàn toàn được tự do dùng và chỉnh sửa, phân phối lại cũng không cần mở mã nguồn, kể cả kiếm tiền với GPT-OSS cũng được thậm chí không cần trả khoản phí gì cho Open AI, chỉ cần giữ nguyên bản quyền tác giả là được.

Như vậy với động thái này báo hiệu việc OpenAI tái gia nhập "cuộc đua mô hình mở" sau sáu năm gián đoạn, sánh vai cùng các đối thủ như Meta, Deepseek và Mistral.

GPT-OSS là gì? Hiểu rõ về "Open-Weight"

Thuật ngữ "GPT-OSS" dùng để chỉ hai mô hình ngôn ngữ mới này, với kích thước lần lượt là 20 tỷ và 120 tỷ tham số. Quan trọng là, OpenAI đã phát hành chúng dưới dạng các mô hình "open-weight", nghĩa là các trọng số đã được huấn luyện của mô hình AI được công khai cho phép tải về và sử dụng trực tiếp trên máy của người dùng. Điều này cho phép các nhà phát triển kiểm tra và tinh chỉnh cách các mô hình hoạt động.

Tuy nhiên, đây không phải là một bản phát hành "mã nguồn mở" đầy đủ theo nghĩa truyền thống, vì OpenAI chưa công bố công khai mã code huấn luyện gốc hoặc các tập dữ liệu thô được sử dụng để huấn luyện các mô hình này. Ngược lại, một mô hình thực sự mã nguồn mở sẽ cung cấp toàn bộ mã nguồn, tài liệu huấn luyện, trọng số và đôi khi cả tập dữ liệu, cho phép cộng đồng xem, sửa đổi và thậm chí huấn luyện lại mô hình. Mặc dù sự khác biệt này còn gây tranh cãi trong cộng đồng mã nguồn mở, OpenAI nhấn mạnh rằng bản phát hành này là một bước đi tiếp theo sau sáu năm hướng tới việc làm cho lợi ích của AI trở nên dễ tiếp cận rộng rãi.

Hiệu suất vượt trội và khả năng nâng cao

Dù "mở", hiệu năng của GPT-OSS vẫn rất đáng gờm. Các bài kiểm tra (benchmark) cho thấy nó có thể cạnh tranh với mô hình đóng của Open AI :

GPT-OSS-120B: Gần tương đương với o4-mini trong các tác vụ suy luận cốt lõi, mô hình này yêu cầu GPU 80GB trở lên.
GPT-OSS-20B: Tương tự o3-mini, có thể chạy trên phần cứng tiêu dùng với 16GB bộ nhớ.

So sánh hiệu suất GPT-OSS

GPQA diamond

Câu hỏi khoa học cấp tiến sĩ (không dùng tools)

MMLU

Câu hỏi lĩnh vực học thuật

AIME 2025

Câu hỏi toán thi đấu

Các điểm nổi bật về kiến trúc và khả năng chính bao gồm:

Kiến trúc Mixture-of-Experts (MoE): Cả hai mô hình đều sử dụng thiết kế MoE, kích hoạt ít tham số hơn trên mỗi token (5,1 tỷ cho 120B và 3,6 tỷ cho 20B) để xử lý hiệu quả truy vấn.
Suy luận Chain-of-Thought (CoT): GPT-OSS hỗ trợ khả năng suy luận nâng cao, cho phép các nhà phát triển cấu hình các mức độ nỗ lực suy luận khác nhau (thấp, trung bình hoặc cao) để cân bằng tốc độ và độ chính xác. Các mô hình có thể hiển thị toàn bộ chuỗi suy luận nội bộ của chúng, điều này có thể hỗ trợ gỡ lỗi logic của chúng.
Sử dụng công cụ và đầu ra có cấu trúc: Các mô hình được thiết kế cho các trường hợp sử dụng nâng cao bao gồm sử dụng công cụ, chẳng hạn như công cụ duyệt web để tương tác web và công cụ Python để thực thi mã trong môi trường sổ ghi chép Jupyter.
Huấn luyện chuyên sâu: Được huấn luyện trên hàng nghìn tỷ token chỉ bằng văn bản tập trung vào STEM, mã hóa và kiến thức tổng quát, sử dụng GPU NVIDIA H100 và PyTorch. Thời điểm cắt dữ liệu kiến thức của các mô hình là tháng 6 năm 2024.
Định dạng OpenAI Harmony: Một dự án mã nguồn mở mới từ OpenAI, Harmony, cung cấp một định dạng phản hồi mới lạ cho các mẫu lời nhắc, giới thiệu các vai trò như system, developer, user, assistant, và tool, cùng với các kênh đầu ra riêng biệt cho final (hướng tới người dùng), analysis (chuỗi suy luận), và commentary (liên quan đến công cụ). Cấu trúc này nâng cao khả năng của mô hình trong việc quản lý các tương tác phức tạp.

Ý nghĩa và lợi ích đối với hệ sinh thái AI

Quyết định phát hành các mô hình GPT-OSS miễn phí được xem là một động thái chiến lược của OpenAI nhằm lấy lại vị thế trong bối cảnh AI đang ngày càng cạnh tranh. Bằng cách cung cấp các mô hình "open-weight" mạnh mẽ, OpenAI không chỉ thúc đẩy đổi mới mà còn trao quyền cho các nhà phát triển và doanh nghiệp.

Điều này mang lại nhiều lợi ích đáng kể:

Tăng cường quyền riêng tư: Các doanh nghiệp, đặc biệt trong các ngành yêu cầu bảo mật cao như y tế hay tài chính, có thể triển khai mô hình cục bộ (on-premise) để bảo vệ dữ liệu nhạy cảm.
Tiết kiệm chi phí: Việc triển khai cục bộ giúp giảm độ trễ và chi phí sử dụng API thương mại.
Thúc đẩy đổi mới: Cộng đồng có thể tự do tinh chỉnh và phát triển các giải pháp AI tiên tiến dựa trên các mô hình này.

Có hỗ trợ tinh chỉnh (Fine-Tune) và gọi hàm (Function Calling)

Các mô hình GPT-OSS được thiết kế hoàn toàn có thể tinh chỉnh (fine-tune), mặc dù không có mã code huấn luyện gốc. Chúng đã được tích hợp vào thư viện transformers của Hugging Face và hỗ trợ các kỹ thuật fine-tune tiết kiệm tài nguyên như LoRA, PEFT, và QLoRA.

Tất nhiên là GPT-OSS có hỗ trợ function calling cho phép mô hình gọi và xử lý kết quả từ các hàm hoặc API bên ngoài trong quá trình hội thoại. Thật sự đây là thứ mà không thể thiếu đối với các mô hình hiện nay để tăng tính kết nối.

Mặc dù việc sử dụng fine-tune mà không có script huấn luyện gốc có thể phức tạp hơn, hoàn toàn không dễ dàng với người thiếu kinh nghiệm nhưng các nhà phát triển nên thử các nền tảng như Unsloth đã phát triển các giải pháp tùy chỉnh và kỹ thuật offloading để làm cho mọi việc dễ dàng hơn đôi chút, cho phép huấn luyện LoRA GPT-OSS-20b trên VRAM 14GB và GPT-OSS-120b trên VRAM 65GB.

Cách tiếp cận và triển khai:

Hugging Face: Thông qua dịch vụ Inference Providers mà họ đã cung cấp bản demo chính thức của OpenAI.
Triển khai trên chính máy của người dùng (Local Inference): Được hỗ trợ bởi các thư viện như transformers, vLLM, llama.cpp, và ollama. Ví dụ, mô hình 20B có thể chạy trên Macbook, Mac mini chỉ với RAM 32GB.
Có thể chạy thông qua Docker.
Nền tảng cloud : Có sẵn trên các nền tảng như Azure AI Model Catalog và Dell Enterprise Hub cho các triển khai doanh nghiệp an toàn.

Các nhà phát triển có thể sử dụng nhiều tối ưu hóa khác nhau để tăng tốc độ suy luận, bao gồm lượng tử hóa MXFP4 cho GPU Hopper hoặc Blackwell, Flash Attention 3 và MegaBlocks MoE kernels.

Cam kết mạnh mẽ và tranh cãi xoay quanh GPT-OSS

Mặc dù mô hình được cộng đồng đón nhận tích cực, nhưng đã không còn tính wow khi nói về "tính mở" của nó. Sự khác biệt giữa "open-weight" và "open-source" vẫn là một điểm gây tranh cãi đối với một số người ủng hộ sự minh bạch hoàn toàn, mà còn ở những đối thủ của Open AI đã làm trước đây rất lâu rồi.

Ngoài ra, trong quá trình thử nghiệm, một số trường hợp mô hình gpt-oss-20b "rò rỉ" thông tin chuỗi suy luận nội bộ đã được quan sát, mặc dù OpenAI đã chỉ ra rằng đây là một hành vi được mong đợi để cho phép giám sát và tránh các mô hình che giấu dấu vết của chúng.

Tóm lại, các mô hình GPT-OSS của OpenAI với quá trình thể hiện chắc chắn vẫn chưa hoàn hảo mà chỉ để thể hiện cam kết mạnh mẽ đối với việc làm cho AI trở nên dễ tiếp cận hơn.

Discussion (0)

No comments yet. Be the first!

How to control Codex from your phone with ChatGPT app

You're out and suddenly remember a small detail in your project that needs fixing — you don't have to open your laptop or remote desktop in. With the right connection set up, ChatGPT app on your phone can become a control panel for Codex, while your computer at home or the office keeps running the actual code. ChatGPT app doesn't run Codex on your phone The easiest thing to misunderstand is thinking Codex is running directly on your phone. In reality, your phone only sends prompts, replies, approvals and follow-up messages, while the actual working environment lives on your Mac or Windows machine running Codex. In other words, ChatGPT app is the remote controller, and the host machine is where your repo, terminal, credentials, plugins, MCP servers and other tools actually live. This makes complete sense because codebases typically live on your development machine, not your phone. When you send a request like fixing a TypeScript error, running tests or checking a diff, Codex processes it inside the selected project on the host and sends results back for you to review. If you want to understand the foundation before using remote access, check out What is Codex and how to use Codex to get a clear picture of where this tool fits in your workflow. What do you need before connecting ChatGPT app to Codex? According to the latest Codex documentation from OpenAI, ChatGPT app supports controlling Codex on both macOS and Windows, though Linux is not supported yet. Notably, this feature works with all ChatGPT account types, including Free and Go — no paid plan required. You only need to make sure you're signed into the same account or workspace on both devices: ChatGPT mobile (latest version on iOS or Android) and Codex (latest version on your host machine, online and running). Your host machine must stay on and Codex must keep running for the entire time you're controlling it remotely. If the machine goes to sleep, loses its connection or Codex is closed, the connection from your phone drops immediately and any tasks in progress may be interrupted. What's worth noting is that the entire setup process starts from Codex App on the host machine and is surprisingly simple — just scan a QR code and you're done. Inside Codex App, select the mobile setup option in the sidebar, scan the QR code with your phone, then complete the confirmation in ChatGPT app. For enterprise workspaces, an admin may need to enable Remote Control permissions before you can connect. This QR code grants control over your computer, so keep it private and never share it with anyone to avoid unauthorized access to your machine. To summarize, connecting ChatGPT app to Codex is straightforward: Host machine must be online and running Codex ChatGPT app and Codex must be signed into the same account or workspace Generate the QR code in Codex on the host and complete setup on your phone MFA, SSO or passkey requirements may still apply depending on your workspace What can you do once connected? Once the host appears in Codex on your phone, you can start a new thread inside a project on the host or pick up an existing one. This is where the experience becomes genuinely useful: you can send follow-ups, answer Codex's questions, approve commands, view output, check diffs, review test results and even receive notifications when a task finishes or needs your attention. A real example: you're at a coffee shop and remember the login form has a validation bug. You open ChatGPT app, select the connected host, and ask Codex to check the auth flow, fix the email validation error and run the related tests. Codex works directly on the repo sitting on your host machine, while you review the results, approve actions when needed and decide whether to request further changes. This is also why people are starting to think of Codex and other AI-powered IDEs as a colleague working inside a real environment, not just a code suggestion tool anymore. Its strength lies in reading files, running commands, editing code and maintaining context across multiple rounds of back-and-forth. Limitations to keep in mind when using Codex from your phone Remote control depends entirely on the host machine — if your computer goes to sleep, loses its connection, closes Codex or gets signed out of the workspace, your phone loses its working environment immediately. That said, if Codex is mid-task when the connection drops, it will continue running on the host and notify you once your phone reconnects, so there's less to worry about if your phone suddenly loses signal during a running task. One more thing to note: on Windows, tasks using Computer Use require an appropriate foreground session, so this setup is not a complete replacement for sitting directly in front of your machine. It also helps to draw a clear line between handing off a focused task and reviewing large changes. Your phone works well for small bugs, running tests, quick questions about a specific file, reviewing short tasks or checking task status. However, anything requiring a high level of attention should still be reviewed on a larger screen to avoid missing details. How to use it effectively in practice The most effective approach is to hand off tasks with a clear scope and specific expected outcomes. Instead of saying "fix the login", describe exactly where the error occurs, what the expected behavior should be after the fix, which tests to run and which parts of the codebase to leave untouched. Codex performs better when it knows the boundaries of a task, especially since remote mobile means each feedback loop takes longer than when you're sitting right at your machine. A clean working rhythm might look like this: describe the task in detail whether small or medium-sized, ask Codex to read the relevant files, let it propose a solution, only approve when necessary and wait for the result report. Once you get used to this rhythm, you'll find that idle time outside can handle real work — while keeping the final decision firmly in your hands. Compared to Claude Code Remote and Telegram bot There are many ways to control an AI coding agent from your phone, though the three most common approaches each serve a different need. Criteria ChatGPT app + Codex Claude Code Remote Telegram + Codex Natural conversation ✅ Excellent ✅ Good ❌ Requires exact syntax Granular control Moderate Highest Low Connection stability Stable Stable Frequent drops Mobile UI Well optimized Not fully optimized Uses existing Telegram app Initial setup Easy, scan QR Easy Requires manual bot configuration Computer must stay on ✅ Required ✅ Required ✅ Required Claude Code Remote Control offers the strongest level of control — you get direct terminal output, can intervene mid-task and generally feel much closer to what the agent is doing. That said, the UI on small phone screens isn't fully optimized yet, and some interactions are still difficult to perform without a physical keyboard. Telegram bot has the advantage of not requiring a separate app and is easy to get started with, but the real-world experience has clear limits: it's prone to slowdowns, occasional silent disconnections mid-task, and because it lacks genuine AI context, anything slightly more complex than a simple command quickly falls apart — forcing you to type precise instructions rather than describe what you need naturally. ChatGPT app + Codex sits at the best balance point for most users — smooth enough, smart enough, quick to set up with a QR scan and no new syntax to learn before you can get to work. Connecting ChatGPT app to Codex doesn't turn your phone into a development machine — it turns your phone into a control surface for a development machine that's already ready to work. As long as the host stays on, permissions are configured correctly and the task is scoped tightly enough, this is the most practical way to handle real coding work when you're away from your laptop.

Nam•

22 Jun, 2026

Microsoft launches 7 new AI models to challenge OpenAI

Microsoft just dropped seven new AI models at Build 2026, with MAI-Thinking-1 boasting 35 billion active parameters and trained entirely on clean data. For the first time, the software giant is openly challenging the position of its own strategic partner, OpenAI, on the AI model battlefield. MAI-Thinking-1 and Microsoft's reasoning ambitions The centerpiece of Build 2026 was MAI-Thinking-1, Microsoft's first reasoning AI model developed entirely in-house. With approximately 35 billion active parameters, the model is designed to handle multi-step reasoning tasks, work with long contexts, and support complex coding, all at a lower cost than many large-scale AI models currently available. The most notable claim is that Microsoft trained MAI-Thinking-1 on clean data without using distillation from third-party AI models. In other words, this is a clear statement that Microsoft has the independent AI research capability to build competitive models without "borrowing" knowledge from GPT or any other model. According to Microsoft's published evaluations, MAI-Thinking-1 achieves competitive performance on coding benchmarks and is rated on par with many leading AI models in blind evaluation tests. The 35-billion parameter count also signals that Microsoft is prioritizing efficiency over raw scale, as many competitor models have significantly more parameters but may not necessarily deliver better output quality. From coding to voice: a complete AI ecosystem Beyond reasoning, Microsoft introduced six additional AI models to build a complete AI ecosystem serving both individual users and enterprises. From coding and image generation to voice synthesis, every piece of the puzzle now has a dedicated model. Smarter coding with MAI-Code-1-Flash For developers, MAI-Code-1-Flash is significant news. This model specializes in code generation and software development support, optimized for real-world programming tasks. More importantly, it will be integrated directly into GitHub Copilot and Visual Studio Code, two tools used daily by millions of developers. This means code suggestions and automated coding experiences will be significantly upgraded within familiar development environments. Images and voice: the missing pieces In the creative content space, Microsoft announced MAI-Image-2.5 alongside MAI-Image-2.5-Flash. These are next-generation image creation and editing models, with the Flash version optimized for fast response times, making it suitable for real-time applications like live photo editing or on-demand illustration generation. In the audio domain, Microsoft introduced two important models: MAI-Voice-2 with more natural voice synthesis capabilities and support for additional languages MAI-Transcribe-1.5 for speech-to-text conversion with significantly faster processing speeds than the previous generation Additionally, Microsoft has developed optimized variants specifically for the Microsoft Foundry platform, helping enterprises easily build and deploy their own AI applications. The strategy to reduce OpenAI dependence Where Microsoft was previously seen mainly as an infrastructure partner and deployment platform for OpenAI, Build 2026 shows the company is steadily acquiring all the essential components of a full AI ecosystem. Microsoft now has its own reasoning model, coding model, image generation model, voice synthesis model, and speech recognition model, all connected directly to the Azure, Copilot, and Microsoft Foundry ecosystem. This strategy gives Microsoft greater autonomy in developing core technology while reducing risk from dependence on external partners. More specifically, owning proprietary AI models allows Microsoft to control its product roadmap, optimize operational costs, and customize models for specific service needs without waiting for or negotiating with third parties. Where does the AI model race go from here? The simultaneous launch of seven new AI models shows Microsoft is investing heavily in foundational technologies to compete directly with major players like OpenAI, Google, and Anthropic. When OpenAI's largest partner decides to build its own AI models, that is the clearest signal that the AI race has entered a new phase where no one wants to place the future of their technology in someone else's hands. For developers and enterprises, now is the time to closely watch Microsoft Foundry and the Azure AI ecosystem, as tools that were previously only available through OpenAI will soon appear within Microsoft's familiar ecosystem. Build 2026 may well be remembered as the moment Microsoft officially declared its vision for an independent, comprehensive AI ecosystem with its own distinctive identity.

Nam•

4 Jun, 2026

What is an agent harness? The framework that helps AI work efficiently

Imagine having an AI assistant that is incredibly smart but forgets everything between sessions and cannot check the quality of its own work. To solve this problem, developers created a protective management layer around AI models called an agent harness. This is what enables AI agents to complete complex, multi-step tasks autonomously without requiring constant human intervention. What is an agent harness? Think of an AI model as a brilliant new employee with no long-term memory and zero familiarity with the workplace. They can solve complex problems in seconds but will just as easily forget what they were working on, or accidentally send a confidential document to the wrong client. In that scenario, an agent harness acts as the experienced manager sitting right beside them, keeping things on track. Put simply, an agent harness is the software layer wrapping around an AI model that handles all administrative and logistical work so the model itself can focus entirely on reasoning and problem-solving. It connects the AI to external tools, maintains a complete record of work across sessions, and verifies results before considering a task done. In practice, an agent harness handles the following: Connecting the AI model to external tools such as web search, email, and calendars Persisting progress across sessions so the AI never has to start from scratch Filtering out irrelevant information and supplying only the data the AI actually needs at each step Monitoring AI actions to prevent dangerous mistakes Logging activity in detail so humans can audit what happened when needed Origin of the term: The concept of "agent harness" was formally named by technology engineer Mitchell Hashimoto in early 2026. Before that, many development teams had built similar systems but had no shared term for this layer of infrastructure. Why do AI agents fail at long-running tasks? The biggest weakness in today's AI models is the complete absence of long-term memory. Every new conversation starts from zero with no recollection of anything that happened before. Imagine hiring an employee who wakes up every morning having forgotten every agreement, every deadline, and every piece of progress from the day before. When Anthropic tested Claude building a complex web application without harness support, the results were consistently disappointing. Two failure modes kept appearing: The AI tried to do everything at once, ran out of working memory midway through, and left the project unfinished. The next session wasted time trying to figure out what had already been done. The AI declared the task complete without actually running the result to verify it worked. Beyond those two core failures, long-horizon tasks expose three additional problems: Context clog: Accumulated conversation history and tool outputs crowd out the original instructions, causing the AI to gradually lose focus on the actual goal Tool misuse: The AI sometimes searches for information that does not exist or submits incorrect inputs to forms, and without anything to stop it, repeats the same error in a loop Total progress loss on failure: Any network error or system crash wipes out whatever was stored in temporary memory, forcing a full restart Stanford research (2023): AI models tend to overlook information buried in the middle of long text, even when that text is not particularly long. This is why feeding too much data to an AI all at once often backfires without a filtering layer in place. How does an agent harness work in practice? An agent harness operates in two distinct phases to keep work flowing continuously without interruption. Setup phase (runs once) The harness prepares the full working environment before the AI begins: building a structured task list, initializing storage, and recording the starting point. Think of it as the manager drawing up a detailed project plan before handing anything off. This phase only needs to happen once. Execution phase (repeats) Each time the AI begins a new session, the harness automatically reloads all saved progress and assigns only the next relevant task. When the AI wants to take an action such as searching for information or sending a notification, the harness checks whether that request is valid, executes it safely, cleans the returned result, and passes it back to the AI. The model never touches external systems directly without going through this control layer first. The four core components of an agent harness For an AI to operate reliably over extended periods, a standard agent harness needs four essential components: External tool gateway: Allows the AI to interact with the real world by reading documents, searching the web, or sending messages. The harness acts as an intermediary, validating every request before execution and ensuring returned results are clean and usable. Layered memory management: Maintains three types of memory serving different needs: short-term working memory for the current session, a task log recording what has been completed and what remains, and a long-term knowledge store that accumulates across multiple projects over time. Intelligent context filter: Summarizes long conversation histories down to key points and supplies only the data relevant to the current step rather than loading everything at once, keeping the AI focused on the right task at the right moment. Safety checker and human approval gate: Automatically verifies results before marking a task as complete. For sensitive actions such as deleting important data or sending bulk emails, the harness pauses and waits for human confirmation before proceeding. Note on accumulated knowledge: If an AI agent's memory is stored entirely within a closed third-party platform, all the knowledge it builds up over time belongs to that platform. Switching to a different system means starting from zero. This is worth thinking through carefully when choosing a long-term AI agent solution. Harness engineering and the secret behind millions of lines of code Harness engineering is the practice of treating every AI failure as a system problem to fix permanently rather than something to retry or ignore. As Mitchell Hashimoto put it: if the agent makes a mistake, redesign the environment so that mistake becomes physically impossible to repeat. In practice, when OpenAI built large software projects with three engineers producing 3.5 pull requests each per day without typing a single line of code, they had set up automatic verification checks after every AI action. When the AI produced something incorrect, the system returned error messages written in a specific structure so the AI immediately understood what needed to change on the next attempt. Every error message became a learning signal, not just a warning. A study presented at ICML 2025 further confirmed that the same AI model equipped with a harness consistently outperformed itself running without one, even with identical training weights and identical prompts. The environment surrounding the AI matters just as much as the model itself. A telling data point: Anthropic's Claude Code has grown past 512,000 lines of code and continues to expand. More capable models do not make the harness simpler. They make it larger, because there is more capability to orchestrate and more failure modes to guard against. When do you actually need an agent harness? For simple one-off tasks like summarizing a document or answering a specific question, calling an AI directly is perfectly fine. But the moment work extends beyond a single conversation, requires memory from a previous session, or involves multiple steps that need to happen in a specific order, a harness becomes necessary. One thing worth reflecting on: the built-in web search in ChatGPT and Gemini is itself a form of harness. When AI automatically looks something up, there is infrastructure behind the scenes making the tool call, processing the result, and feeding clean information back into context. The harness is invisible to the user but indispensable to the system. Agent harness is not a short-term technical trend. It is the answer to fundamental limitations that AI cannot resolve on its own: no long-term memory, finite working context, and a tendency to misuse external tools without guardrails. 4AIVN has also started applying harness to our own workflows — and what we have found is that it does not just help AI finish tasks. It turns AI into a system that learns from failure and gets more reliable over time.

Nam•

1 Jun, 2026

What is Codex? OpenAI's rising star tool

Three million Codex users per week — up six times in just the first three months of 2026. That number tells you something: Codex is the rising star. OpenAI is turning it into an all-in-one tool, which means Codex is no longer just a playground for developers. What is Codex? A tool that's not just for developers Think about this scenario: you want to build a spreadsheet that automatically updates every week, or a small website to let customers book appointments, or simply a tool that summarizes your email reports each morning without opening dozens of tabs. Previously, these things required a developer. With Codex, you just type your request in plain English and wait for the result. Codex is OpenAI's AI agent, launched in May 2025 and deeply integrated into the ChatGPT ecosystem. Its core difference from regular ChatGPT is that Codex doesn't just answer — it actually does the work through a code execution environment. You assign a task, Codex plans it out, executes each step, checks the result, and returns a finished product ready to use. No need to understand what code is, no need to monitor every command. Codex can now run as a standalone desktop app available for both Windows and macOS, and has recently expanded to Android and iOS on mobile as well. You can sign in using your existing ChatGPT account. Codex is currently available on ChatGPT Plus, Pro, Business, and Enterprise plans, though Free and Go users also get limited trial access. What Codex can do for you Build apps or small websites from a description You don't need to know HTML or JavaScript. Just describe what you need: "Create a simple appointment booking page with fields for name, phone number, and date/time selection, and send an email notification whenever someone books." Codex will build the entire interface, handle the logic, and guide you through publishing it online. A startup team in the US once shared that they completed in one weekend what would have previously taken an entire quarter — and it wasn't a team full of developers. Automate repetitive tasks This is where non-developer users will find the most value. For example: every week you have to consolidate revenue data from three different Excel files, merge them, and send a report to your manager. Codex can build an automated workflow that does this for you on a schedule and delivers the result without you ever opening your laptop. With the Automations feature launched in the April 2026 update, Codex can take on long-horizon tasks, pause, resume, and complete them over multiple days without needing to be reminded. Generate images and prototypes directly in the app Codex integrates image generation powered by GPT Image 2.0 directly inside the app. You can ask Codex to create interface mockups, product banners, or illustration assets for a document — all within the same workflow, without switching to another tool. For content creators, marketers, and solo founders, this is a genuine advantage: the entire journey from idea to finished output can happen in a single window. Control your computer to work in the background Since April 2026, Codex can operate Mac applications using its own cursor, viewing the screen and clicking and typing to complete tasks while you continue using the machine normally. A simpler way to picture it: you're in an online meeting while Codex has Figma open, editing a design and saving the file according to instructions you set earlier. Two things happening in parallel, neither getting in the other's way. The computer use feature is currently only available on macOS and is not yet available in the EU or UK. You will need to grant Codex Accessibility and Screenshot permissions during initial setup. How to get started with Codex Codex requires installing the desktop app on Windows or macOS — it does not run directly in a web browser. The setup process is straightforward and takes just a few minutes. Step 1: Go to openai.com/codex and download the version for your operating system. On macOS, there are two separate builds for Apple Silicon (M1 and later) and Intel chips. On Windows, there is a single universal build. Step 2: Install the app and sign in with your existing ChatGPT account or OpenAI API key. Step 3: Choose a project folder you want Codex to work within — you can also link it to a github repository — or skip this step if you only want to assign standalone tasks like creating files, generating images, or automating a workflow. Step 4: Type your request in natural language, and be as specific as possible. Instead of "make me something about a report," try: "Create an Excel file summarizing monthly revenue from the data I provide, add a bar chart comparing each month, and highlight the month with the highest revenue." The more specific your request, the better the result. Codex works best when you clearly describe the input, the desired output, and any constraints you need — such as file format, display language, or calculation rules. Codex vs Claude Code, Antigravity, and Cursor from a non-technical user's perspective If you're not a developer, the real question isn't "which tool is technically more powerful" — it's "which tool can I use right now without learning anything new." From that angle, these four tools are clearly different from one another. Codex and Claude Code Claude Code from Anthropic is Codex's most direct and formidable competitor. In terms of raw technical output quality, Claude Code currently leads the pack — producing cleaner code, tighter logic, and handling large, complex codebases more effectively. However, Claude Code is explicitly designed for developers: it runs in the terminal, requires command-line installation, and notably has no image generation capability. If you're not comfortable in a terminal, Claude Code is a barrier from the very first step. Codex, by contrast, offers a more user-friendly desktop interface, integrates image generation within the same workflow, and is noticeably more accessible to non-technical users. Codex and Antigravity Both require a desktop app, but their underlying philosophies are completely different. Codex is built around a "hand off the task and wait for results" model: you describe what you need, the agent runs in an isolated cloud sandbox, and returns a finished product without affecting your machine at all. It suits people who want to automate workflows, create files, or build something without monitoring every step. Antigravity works in the opposite direction: the agent runs directly on your machine, watches your screen, opens applications, and collaborates with you in real time while you work. If you want an AI colleague working alongside you — observing and reacting to what's happening on your screen — Antigravity is the better fit. Codex and Cursor Cursor is built on VS Code and targets developers who want to keep their familiar working environment intact. For non-coders, Cursor is largely inaccessible because the entire experience revolves around editing code inside an editor. Cursor excels at understanding an entire codebase and offers flexibility in choosing AI models, but those advantages are for developers — not for general users who need to automate workflows or build something from scratch. In summary, from a non-technical user's perspective: Codex: Friendly desktop interface on Windows and macOS, capable of generating images, well-suited for users who want AI as an automated workflow tool. Claude Code: Best technical output quality, but developer-oriented and cannot generate images. Antigravity: Agent works directly on your machine in real time, suited for users who want to collaborate with AI while they work. Cursor: Best for developers keeping their VS Code workflow intact; not suited for general users. Who is Codex best for? If you're a content creator who wants to build a landing page for a campaign without hiring a developer, Codex fits. If you're a marketer who needs to automate weekly reports pulling from multiple data sources, Codex fits. If you're a solo founder who needs to ship a product fast without a technical team, Codex fits. If you're a teacher who wants to build a small quiz app for students without learning to code, Codex fits. On the other hand, if you're a developer who needs granular control over every line of code in a large, complex codebase, Claude Code will deliver better output quality. Codex is the right tool for people who want fast results without needing to understand what's happening under the hood. One practical limitation worth knowing: Codex currently has full support for Python, JavaScript, TypeScript, and Ruby. For tasks that don't involve code — like generating images, automating workflows, or creating documents — this language limitation has no impact on you. The line between "can code" and "can't code" is fading The question "do you know how to program?" is losing its weight as tools like Codex continue to evolve. What matters more now is whether you can describe clearly what you want — because that's exactly the thinking skill required to work effectively with Codex and similar AI agent tools. If you want to try it today, start with something small and specific: ask Codex to create an Excel file consolidating data you currently process manually every week. That's the fastest test to evaluate whether Codex genuinely saves you time or not.

Nam•

15 May, 2026

Quick Summary

GPT-OSS là gì? Hiểu rõ về "Open-Weight"

Hiệu suất vượt trội và khả năng nâng cao

So sánh hiệu suất GPT-OSS

GPQA diamond

MMLU

AIME 2025

Ý nghĩa và lợi ích đối với hệ sinh thái AI

Có hỗ trợ tinh chỉnh (Fine-Tune) và gọi hàm (Function Calling)

Cách tiếp cận và triển khai:

Cam kết mạnh mẽ và tranh cãi xoay quanh GPT-OSS

Discussion (0)

Related Articles

How to control Codex from your phone with ChatGPT app

Microsoft launches 7 new AI models to challenge OpenAI

What is an agent harness? The framework that helps AI work efficiently

What is Codex? OpenAI's rising star tool