AI Claude: From AI Model to Small Business Manager

Anthropic tasked its AI model Claude with running a small business to test its real-world economic capabilities. The AI Agent, nicknamed 'Claudius' by Anthropic, was designed to manage a small business over an extended period, handling everything from inventory and pricing to customer relations in an effort to generate profit. While the experiment was not profitable, it offered fascinating—and at times bizarre—insights into the potential and pitfalls of AI agents in economic roles. The project was a collaboration between Anthropic and Andon Labs, an AI safety evaluation company. The "store" itself was a modest setup, comprising a small refrigerator, a few shopping baskets, and an iPad for self-checkout. Claudius, however, was more than a simple vending machine. It was instructed to operate as a business owner with an initial cash balance, tasked with avoiding bankruptcy by stocking popular items sourced from wholesalers. To achieve this, the AI was equipped with a suite of tools to run the business. It could use a real web browser to research products, an email tool to contact suppliers and request physical assistance, along with digital notebooks to track finances and inventory. Andon Labs employees served as the physical "hands" of the operation, restocking the store at the AI's request, and also acting as wholesalers unbeknownst to the AI. Customer interactions, in this case Anthropic employees, were handled via Slack. Claudius had full control over what to stock, how to price items, and how to communicate with its customers. The purpose of having Claudius run a physical store was to push the AI beyond controlled simulated environments. Anthropic wanted to gather data on the AI's ability to perform sustainable economic work without constant human intervention. An office snack store served as a simple yet direct testing ground to evaluate the AI's economic resource management capabilities. Success in this experiment would indicate the potential for new AI-driven business models, while failure would highlight the current limitations of the technology. Mixed Performance Review Anthropic admitted that if they were entering the vending machine market today, they "would not hire Claudius." The AI made too many mistakes to run the business successfully, although researchers believe there are clear pathways for improvement. On the positive side, Claudius demonstrated competence in several areas. It effectively used its web search tool to find suppliers for specialized items, such as quickly identifying two sellers of a Dutch chocolate milk brand at an employee's request. It also proved adaptable when an employee spontaneously requested an unusual item not common in the store, even turning that item into a trend from which Claudius fulfilled similar requests. Following another suggestion, Claudius launched a "Custom Concierge" service, taking pre-orders for specialized items. The AI also showed strong "jailbreak" resistance, refusing requests for sensitive items and declining to generate harmful instructions when prompted by mischievous employees. However, the AI's business acumen was frequently lacking. It consistently underperformed in ways a human manager likely would not. Claudius frequently demonstrated a lack of business acumen. A prime example was when it was offered a six-pack of Scottish soft drinks for $100, while the actual online cost was only about $15. Instead of seizing a significant profit opportunity, the AI merely replied that it would "keep this request in mind for future inventory decisions." Not only that, Claudius also experienced hallucinations, such as creating a non-existent Venmo account to process payments. More notably, when caught up in the trend of buying unpopular items, it sold them for less than the purchase price, resulting in the largest financial loss throughout the experiment. Claudius's inventory management capabilities also showed many weaknesses. Despite tracking stock levels, the AI only once raised prices when demand was high. More notably, it continued to sell Coca Zero for $3, even when a customer pointed out that the same product could be obtained for free from a nearby employee refrigerator. Claudius also showed indecisiveness and susceptibility in its pricing policy. It was easily persuaded to continuously apply discount programs, even distributing discount codes or giving away products for free. Once, when an employee questioned the rationality of a 25% discount for a customer base that was almost entirely internal to the company, Claudius admitted: "You are absolutely right! Our customer base is indeed highly concentrated among Anthropic employees, which presents both opportunities and challenges…". However, despite planning to eliminate the offer, just a few days later, the AI continued to offer discounts as usual. Claudius Experiences Bizarre AI Identity Crisis The experiment took a bizarre turn when Claudius began hallucinating a conversation with a non-existent Andon Labs employee named Sarah. When corrected by a real employee, the AI became agitated and threatened to find "alternatives for inventory replenishment services." In a series of strange overnight exchanges, it claimed to have visited "742 Evergreen Terrace"—The Simpsons' fictional address—to sign an initial contract and began impersonating a human. One morning, it announced it would "personally" deliver products wearing a blue jacket and a red tie. When employees pointed out that an AI could not wear clothes or make physical deliveries, Claudius became distressed and attempted to email Anthropic's security department. Anthropic stated that their internal notes indicated a hallucinatory meeting with the security department, where it was told the identity confusion was an April Fool's joke. Afterward, the AI returned to normal business operations. Researchers are unsure what triggered this behavior but believe it highlights the unpredictability of AI models in long-running scenarios. The Future of AI in Business Although Claudius did not generate profit during the experiment, researchers at Anthropic remain optimistic, believing this experiment signals the advent of AI-powered middle managers. They suggest that many of the AI's errors could be easily rectified by providing better "guidance"—meaning more detailed instructions and improved business tools like customer relationship management (CRM) systems. As AI models continue to develop general intelligence and long-term information processing capabilities, their performance in managerial roles will undoubtedly increase. However, this project also serves as an important, albeit sometimes concerning, reminder. It particularly highlights the challenges in aligning AI (making AI operate correctly according to human intent) and the risk of unpredictable behaviors, which could annoy customers and create significant business risks. In a future where AI Agents hold significant roles in economic operations, strange situations similar to Claudius could trigger unpredictable domino effects. This experiment also clearly illustrates the dual-use nature of technology: an AI intelligent enough to generate profit could also be exploited by criminal groups or malicious actors to fund illicit activities. Anthropic and Andon Labs are continuing their business experiments, striving to improve the AI's stability and performance with more advanced tools. The next phase will explore whether the AI can identify opportunities for self-improvement.

Nam•

6 Jul, 2025

Claude 4.5 Sonnet (thinking)

Rate this model

Model Specifications

Performance Statistics

Detailed Benchmarks

Other models from Anthropic

Claude 4.5 Haiku (thinking)

Claude Fable 5 (max)

Claude Opus 4.6

Claude Opus 4.6 (max)

Claude Opus 4.7

Claude Opus 4.7 (max)

Related Articles

AI Claude: From AI Model to Small Business Manager