
Claude Opus 4.7 launches stronger but burns more tokens
Anthropic has released Claude Opus 4.7 with a series of substantial improvements, but there is one warning written directly into the migration documentation: the new tokenizer can generate 1.0 to 1.35 times more tokens from the same content compared to Claude Opus 4.6, and the model thinks more at higher effort levels. If you are using the API and haven't read this carefully before upgrading, next month's bill will be the most expensive lesson you receive from AI. What does Opus 4.7 improve over 4.6? Real numbers from testers Anthropic gave a number of companies early access and collected feedback before the public release. These aren't one-sided marketing claims — the companies recorded specific measured results. Cursor: Opus 4.7 scored 70% on CursorBench, a significant jump from Opus 4.6's 58% and a rare leap between two consecutive versions. Notion: A 14% improvement over Opus 4.6 in multi-step workflows, with fewer tokens consumed and only one third of the tool errors. This is a rare case where a new model improves simultaneously across all three dimensions: quality, cost, and stability. XBOW: Visual acuity benchmark jumped from 54.5% to 98.5%, nearly doubling. This is the largest single improvement recorded and explains why XBOW can now extend Opus to entire categories of computer-use work that were previously out of reach. Rakuten: Resolved three times as many production tasks as Opus 4.6 on their internal benchmark. That said, these numbers come from companies selected for early access who have an incentive to publish strong results. Each company's internal benchmark cannot be directly compared to the others and may not reflect your specific workflow. Three behavioral changes worth paying attention to Literal instruction following, for better and worse. Anthropic states clearly in the release documentation that Opus 4.7 executes instructions more precisely, to the point where "prompts written for older models may produce unexpected results because where the older model would skip over or interpret flexibly, Opus 4.7 follows literally." For developers, this means that if your system prompt has ambiguous or conflicting rules, Opus 4.7 will surface them immediately rather than silently resolving them as before. This is an improvement in reliability, but it requires a full review of your prompts before deploying. One example from Vercel: "Opus 4.7 even writes its own proofs for systems code before starting work, which is a new behavior not seen in previous Claude models." The model doesn't just do what is asked; it adds a self-verification step before reporting results. Less flattery and hollow filler responses. Hex confirmed: "It reports accurately when data is missing instead of producing answers that sound correct but are fabricated." In practice, you won't see sycophantic phrases like "you're amazing" or "you're better than 95% of people in the world," and when information is missing it will ask rather than guess. Opus 4.7 appears to have improved meaningfully here, whereas Opus 4.6 would occasionally produce flattering remarks or fabricate inaccurate details. As Replit put it: "It pushes back in technical discussions to help me make better decisions. It genuinely feels like a better colleague." High-resolution image processing more than tripled. Opus 4.7 accepts images up to 2,576 pixels on the long edge (approximately 3.75 megapixels), more than three times the limit of previous Claude models. This is a model-level change, not an API parameter, meaning images you send will automatically be processed at higher resolution than before. In practice, Opus 4.7 can analyze documents with small charts, read code from screenshots, and handle computer-use tasks on higher-resolution displays. In testing with multi-page PDFs containing small signatures, Opus 4.7 identified them accurately, and when using Chrome to recognize small characters on a webpage it performed with noticeably higher precision. However, this consumes an extraordinary amount of tokens and around 3 or 4 messages can exhaust a quota immediately, so consider resizing images before sending if you don't need that level of detail. Token consumption remains the biggest concern for most users The new tokenizer produces more tokens from the same content Anthropic acknowledges this directly in the migration guide: Opus 4.7 uses an improved new tokenizer, but the trade-off is that the same text can produce 1.0 to 1.35 times more tokens than Opus 4.6. A factor of 1.35 sounds small but at production scale it is not. If your system currently consumes 10 million tokens per day with Opus 4.6, after upgrading you may consume 13.5 million tokens without changing anything about your content or workflow. For users on the Pro plan, quota will likely run out far sooner than expected, and it appears Anthropic may be nudging users toward upgrading to Max in order to function normally. Combined with the model thinking more at higher effort levels, particularly at xhigh, a new effort level added between high and max, and the fact that <a href="/en/tools/claude-code" target="_blank" rel="n
