TL;DR

  • Anthropic launched Claude Opus 4.5 on November 24, 2025, positioning it as its most capable model for coding, agents, and computer use, with a 200K-token context window and “hybrid reasoning” that can switch between instant replies and extended thinking. Introducing Claude Opus 4.5Claude Opus 4.5 product page
  • It’s also much cheaper than prior Opus releases: $5 per million input tokens and $25 per million output tokens, plus big savings via prompt caching and batch. Anthropic pricingOpus 4.5 product page
  • Benchmarks emphasize real-world productivity: 80.9% on SWE‑bench Verified (coding) and 66.3% on OSWorld (computer use). Opus 4.5 product pageTechCrunch coverage
  • Ecosystem updates: “endless chat,” a wider rollout of Claude for Chrome and Claude for Excel, and upgrades to Claude Code—including a desktop app with parallel sessions and a more rigorous, user‑editable Plan Mode. Launch blogMacRumors
  • Available today via Anthropic’s apps and API, plus on AWS Bedrock, Google Cloud Vertex AI, and Microsoft Foundry. AWSGoogle CloudMicrosoft Foundry catalog

<<stat label="Input/output price cut vs Opus 4.1" value="≈67%" source="anthropic-pricing-2025-11-24"> <<endstat>>

<<stat label="SWE‑bench Verified" value="80.9%" source="anthropic-opus-4-5-system-card"> <<endstat>>

Conceptual hero image of a modern workspace where an AI orchestrates code generation and spreadsheet modeling across a browser and Excel window, with subtle visual cues for agents collaborating on tasks.

What Anthropic launched—and why it matters

Anthropic’s new flagship, Claude Opus 4.5, lands with two big promises: better autonomous work (coding, multi‑step agents, and “computer use”) and meaningfully lower cost. On headline tasks, Opus 4.5 posts 80.9% on SWE‑bench Verified and 66.3% on OSWorld, both focused on practical execution rather than toy puzzles. That mix—strong agentic performance plus spreadsheet/slide work—speaks directly to real team workflows. Opus 4.5 product pageTechCrunch

The launch also folds in a slate of product updates that change day‑to‑day usability: “endless chat” that auto‑summarizes earlier context so long threads don’t hit a wall, broader availability for Claude for Chrome and Claude for Excel, and notable upgrades to Claude Code’s planning and execution. Launch blogMacRumors

The big upgrades for builders

  • Hybrid reasoning with effort control: You can dial Opus 4.5’s “effort” up or down to trade latency/cost for deeper analysis—useful for agents that sometimes need to think hard and other times just move fast. Launch blog
  • Memory, context, and “endless chat”: Anthropic’s context‑management and memory capabilities underpin longer‑running agents and sustained conversations without manual trimming. Launch blogDocs: context editing
  • Claude Code, upgraded: Plan Mode now asks clarifying questions, drafts a user‑editable plan.md, then executes; the desktop app supports parallel sessions (e.g., one agent fixes bugs while another researches). Launch blogClaude Code
  • Chrome and Excel integrations: Claude for Chrome expands to Max users; Claude for Excel enters broader beta for Max, Team, and Enterprise. TechCrunchClaude for ExcelSupport note
  • Agent SDK and platform features: Anthropic’s Agent SDK (renamed from the Code SDK) builds on the Claude Code harness, with tooling for context management and tool use. Agent SDK docs

How much cheaper is “cheaper”?

Opus 4.5’s API pricing is $5/M input tokens and $25/M output tokens—down from Opus 4.1’s $15/$75. Prompt caching can lop off up to ~90% of repeat prompt cost, and Batch mode halves both input and output cost for async jobs. In practice, the new “effort” control and platform‑level context tooling also reduce token spend by keeping plans tighter and tool calls slimmer. Opus 4.5 product pagePricing

Opus pricing at a glance

ModelInput (per 1M)Output (per 1M)
Claude Opus 4.1$15$75
Claude Opus 4.5$5$25

Benchmarks that track with real work

  • Coding: 80.9% on SWE‑bench Verified; TechCrunch notes Opus 4.5 is the first model to cross 80% on that benchmark. Opus 4.5 product pageTechCrunch
  • Computer use: 66.3% on OSWorld, Anthropic’s highest to date. Opus 4.5 product page
  • Agent/tool use and reasoning: Public scorecards highlight strong results on Terminal‑bench, τ2‑bench, MCP Atlas, GPQA Diamond, ARC‑AGI‑2, and MMMU. Microsoft Foundry model catalog

<<callout type="note" title="A quick caveat on benchmarks"> Benchmarks are useful waypoints, not guarantees. Anthropic ran many of these evaluations internally and documents methodology in its system card; treat them as directional and validate on your workloads before committing. Launch blog

Productivity features you can feel

  • Endless chat: Long conversations auto‑summarize older turns so you can keep going without hitting a hard limit—ideal for research threads, support playbooks, or multi‑week builds. Launch blogMacRumors
  • Excel, for real work: Claude understands workbook structure, traces errors, preserves formulas, and supports scenario edits with explanations and cell‑level citations. Claude for Excel
  • Chrome assistance: Managing tabs, forms, and tasks moves from “demo” to an expanded research preview for Max users, alongside added safety checks for risky actions. TechCrunchSupport

Where to get it

  • Anthropic apps and API: Use claude-opus-4-5 (latest release ID noted in Anthropic’s launch post). Launch blog
  • Cloud platforms: General availability on AWS Bedrock and Google Cloud Vertex AI; listed in Microsoft Foundry’s catalog with detailed benchmark cards. AWS BedrockGoogle CloudMicrosoft Foundry model catalog

Which Claude 4.5 should you choose?

  • Opus 4.5: Frontier performance for production‑grade code, lead agents, complex spreadsheets/slides, long‑horizon tasks.
  • Sonnet 4.5: Great balance of speed, cost, and capability—excellent for scaled agents and rapid iteration. Sonnet 4.5 launch
  • Haiku 4.5: Fastest, most cost‑efficient—perfect for high‑volume sub‑agents and latency‑sensitive UX. Claude docs

Safety and reliability

Anthropic says Opus 4.5 advances robustness against prompt‑injection and risky tool actions, with details in the system card and evaluations run with external partners. For teams piloting Chrome or desktop automation, keep user‑confirmation gates on destructive actions and review audit logs for high‑risk flows. Launch blogSupport docs

<<callout type="tip" title="Ship a low-risk pilot in one week">

  • Start with Sonnet 4.5 for discovery; switch to Opus 4.5 only where quality deltas justify cost.
  • Turn on prompt caching and batch for repetitive runs.
  • Use the Agent SDK’s context management and tool routing; define confirm‑before‑commit hooks for file writes, purchases, or mass edits.
  • In Excel pilots, require formula‑preserving edits and cell‑level citations; in Chrome pilots, enforce site‑level permissions.
  • Track success on real tasks (PRs merged, spreadsheet QA passes, support handle time) rather than just benchmarks.

Sources