4AIVN
Back to Rankings
Claude 4.5 Sonnet (thinking) logo

Claude 4.5 Sonnet (thinking)

Anthropic

Claude Sonnet 4.5 is Anthropic's most advanced Sonnet model, optimized for AI agents and coding workflows. It delivers superior performance on coding benchmarks and introduces powerful agentic capabilities such as tool orchestration and speculative parallel execution. This model is suited for multi-context and long-horizon workflows, capable of operating autonomously for many hours.

Rate this model

Your rating: Not rated yet

Average: 2.0 stars (1 reviews)

Model Specifications

Technical information and release details.

Developer

Anthropic

Multimodal Support

No

Intelligence Score

42

Context Window

1m

Average Price (USD/1M tokens)

$6.00

Speed (tokens/s)

67.0

Latency (s)

11.50

Release Date

9/29/2025

Performance Statistics

The model's intelligence score is the average of these benchmark scores

Detailed Benchmarks

Compare Claude 4.5 Sonnet (thinking) with other top models in specific domains.

Other models from Anthropic

Claude 4.5 Haiku (Thinking) by Anthropic is one of the strongest models in terms of intelligence and is reasonably priced compared to similar models. It also stands out for its speed, supports text and image input, text output, and has a 200k token context window with knowledge updated to July 2025.

Claude Opus 4.6 (Non-thinking) — the default high-effort version — is already among the top models for reasoning capability, while the low-effort version thinks less but shows little difference in output. Despite its high cost, it offers multimodal input processing including text and images, while generating quality text output. A key highlight is its context window that can scale up to 1 million tokens, enabling the processing of large volumes of information.

Claude Opus 4.6 (Thinking) is one of the models primarily focused on Adaptive Thinking. Despite being expensive, slow, and verbose, it excels with adaptive reasoning capabilities. This model supports text and image input, then outputs text.

Claude Opus 4.7 (Non-reasoning, High Effort) is one of Anthropic's best models. Despite reduced reasoning, with high effort enabled the model remains very powerful. It supports text and image input, with text output. However, this model is quite expensive, slower than average, and tends to overthink.

Claude Opus 4.7 (Adaptive Reasoning, Max Effort) is one of the leading intelligence models, supporting text and image input while outputting text. It features adaptive reasoning capabilities and is designed for tasks requiring maximum effort. This model is very detailed in its responses.

Claude Opus 4.8 (Adaptive Reasoning, Max Effort) is one of the top models in terms of intelligence, supporting text and image input, and text output. This model excels at complex tasks and is slower than average compared to other models in the same segment. It is also quite verbose in its responses, so Anthropic has added a fast feature for this model, notably with the price remaining unchanged at $6.25/1M input tokens (cache $0.50) and $25/1M output tokens.

Related Articles

AI Claude: From AI Model to Small Business Manager

AI Claude: From AI Model to Small Business Manager

Anthropic tasked its AI model Claude with running a small business to test its real-world economic capabilities. The AI Agent, nicknamed 'Claudius' by Anthropic, was designed to manage a small business over an extended period, handling everything from inventory and pricing to customer relations in an effort to generate profit. While the experiment was not profitable, it offered fascinating—and at times bizarre—insights into the potential and pitfalls of AI agents in economic roles. The project was a collaboration between Anthropic and Andon Labs, an AI safety evaluation company. The "store" itself was a modest setup, comprising a small refrigerator, a few shopping baskets, and an iPad for self-checkout. Claudius, however, was more than a simple vending machine. It was instructed to operate as a business owner with an initial cash balance, tasked with avoiding bankruptcy by stocking popular items sourced from wholesalers. To achieve this, the AI was equipped with a suite of tools to run the business. It could use a real web browser to research products, an email tool to contact suppliers and request physical assistance, along with digital notebooks to track finances and inventory. Andon Labs employees served as the physical "hands" of the operation, restocking the store at the AI's request, and also acting as wholesalers unbeknownst to the AI. Customer interactions, in this case Anthropic employees, were handled via Slack. Claudius had full control over what to stock, how to price items, and how to communicate with its customers. The purpose of having Claudius run a physical store was to push the AI beyond controlled simulated environments. Anthropic wanted to gather data on the AI's ability to perform sustainable economic work without constant human intervention. An office snack store served as a simple yet direct testing ground to evaluate the AI's economic resource management capabilities. Success in this experiment would indicate the potential for new AI-driven business models, while failure would highlight the current limitations of the technology. Mixed Performance Review Anthropic admitted that if they were entering the vending machine market today, they "would not hire Claudius." The AI made too many mistakes to run the business successfully, although researchers believe there are clear pathways for improvement. On the positive side, Claudius demonstrated competence in several areas. It effectively used its web search tool to find suppliers for specialized items, such as quickly identifying two sellers of a Dutch chocolate milk brand at an employee's request. It also proved adaptable when an employee spontaneously requested an unusual item not common in the store, even turning that item into a trend from which Claudius fulfilled similar requests. Following another suggestion, Claudius launched a "Custom Concierge" service, taking pre-orders for specialized items. The AI also showed strong "jailbreak" resistance, refusing requests for sensitive items and declining to generate harmful instructions when prompted by mischievous employees. However, the AI's business acumen was frequently lacking. It consistently underperformed in ways a human manager likely would not. Claudius frequently demonstrated a lack of business acumen. A prime example was when it was offered a six-pack of Scottish soft drinks for $100, while the actual online cost was only about $15. Instead of seizing a significant profit opportunity, the AI merely replied that it would "keep this request in mind for future inventory decisions." Not only that, Claudius also experienced hallucinations, such as creating a non-existent Venmo account to process payments. More notably, when caught up in the trend of buying unpopular items, it sold them for less than the purchase price, resulting in the largest financial loss throughout the experiment. Claudius's inventory management capabilities also showed many weaknesses. Despite tracking stock levels, the AI only once raised prices when demand was high. More notably, it continued to sell Coca Zero for $3, even when a customer pointed out that the same product could be obtained for free from a nearby employee refrigerator. Claudius also showed indecisiveness and susceptibility in its pricing policy. It was easily persuaded to continuously apply discount programs, even distributing discount codes or giving away products for free. Once, when an employee questioned the rationality of a 25% discount for a customer base that was almost entirely internal to the company, Claudius admitted: "You are absolutely right! Our customer base is indeed highly concentrated among Anthropic employees, which presents both opportunities and challenges…". However, despite planning to eliminate the offer, just a few days later, the AI continued to offer discounts as usual. Claudius Experiences Bizarre AI Identity Crisis The experiment took a bizarre turn when Claudius began hallucinating a conversation with a non-existent Andon Labs employee named Sarah. When corrected by a real employee, the AI became agitated and threatened to find "alternatives for inventory replenishment services." In a series of strange overnight exchanges, it claimed to have visited "742 Evergreen Terrace"—The Simpsons' fictional address—to sign an initial contract and began impersonating a human. One morning, it announced it would "personally" deliver products wearing a blue jacket and a red tie. When employees pointed out that an AI could not wear clothes or make physical deliveries, Claudius became distressed and attempted to email Anthropic's security department. Anthropic stated that their internal notes indicated a hallucinatory meeting with the security department, where it was told the identity confusion was an April Fool's joke. Afterward, the AI returned to normal business operations. Researchers are unsure what triggered this behavior but believe it highlights the unpredictability of AI models in long-running scenarios. The Future of AI in Business Although Claudius did not generate profit during the experiment, researchers at Anthropic remain optimistic, believing this experiment signals the advent of AI-powered middle managers. They suggest that many of the AI's errors could be easily rectified by providing better "guidance"—meaning more detailed instructions and improved business tools like customer relationship management (CRM) systems. As AI models continue to develop general intelligence and long-term information processing capabilities, their performance in managerial roles will undoubtedly increase. However, this project also serves as an important, albeit sometimes concerning, reminder. It particularly highlights the challenges in aligning AI (making AI operate correctly according to human intent) and the risk of unpredictable behaviors, which could annoy customers and create significant business risks. In a future where AI Agents hold significant roles in economic operations, strange situations similar to Claudius could trigger unpredictable domino effects. This experiment also clearly illustrates the dual-use nature of technology: an AI intelligent enough to generate profit could also be exploited by criminal groups or malicious actors to fund illicit activities. Anthropic and Andon Labs are continuing their business experiments, striving to improve the AI's stability and performance with more advanced tools. The next phase will explore whether the AI can identify opportunities for self-improvement.

Na
Nam
6 Jul, 2025