The short version

OpenAI released GPT-5.2 on December 11, 2025—framing it as its most capable model for professional work and long‑running agents—just three weeks after Google unveiled Gemini 3, a sweeping upgrade that now powers Search’s AI Mode, the Gemini app, and Vertex AI. Reuters reported the launch followed a “code red” inside OpenAI, while the company says it had been preparing 5.2 for months. Either way, users now have two formidable options aimed squarely at real‑world productivity. OpenAI, Reuters, Google, Google Search.

Editorial hero visual contrasting OpenAI GPT‑5.2 and Google Gemini 3 as two powerful AI systems aimed at work automation.

What’s actually new in GPT‑5.2

OpenAI positions GPT‑5.2 as a step‑function upgrade for knowledge work, software, and agentic workflows. Highlights:

  • Three model tiers: Instant (fast, everyday tasks), Thinking (deeper reasoning), and Pro (highest quality with a new xhigh reasoning effort). Names map cleanly to API IDs like gpt-5.2, gpt-5.2-chat-latest, and gpt-5.2-pro. OpenAI.
  • Productivity focus: New capabilities for building spreadsheets and presentations, stronger long‑context analysis (measured out to 256k tokens), improved tool use, and better vision. OpenAI.
  • Benchmarks (vendor‑reported): GPT‑5.2 Thinking achieves 52.9% on ARC‑AGI‑2 (Verified); GPT‑5.2 Pro reaches 54.2%. It posts 80.0% on SWE‑bench Verified and tops OpenAI’s GDPval with wins/ties in 70.9% of expert comparisons across 44 occupations. OpenAI.
  • Safety: OpenAI reports fewer undesirable responses on mental‑health‑related prompts versus 5.1 and continues to roll out its age‑prediction model for under‑18 protections. OpenAI.
  • Availability and price: Rolling out now to paid ChatGPT plans; API pricing is $1.75 per 1M input tokens and $14 per 1M output tokens for 5.2 (90% cache discount). 5.2 Pro is $21 input / $168 output per 1M tokens. OpenAI.

What Google shipped with Gemini 3

On November 18, 2025, Google introduced Gemini 3 and pushed it directly into core products:

  • Search: For the first time, a new Gemini model shipped in Search on day one. AI Mode now renders dynamic, generated interfaces—interactive calculators, visual layouts, even real‑time simulations—tailored to your query. Google Search.
  • Apps and cloud: Gemini 3 is available in the Gemini app, AI Studio, and Vertex AI for enterprises; Google also launched Antigravity, an “agent‑first” development environment for autonomous coding workflows. Google.
  • Reasoning mode: Gemini 3 Deep Think targets harder problems and, in Google’s testing, scores 45.1% on ARC‑AGI‑2 (with code execution, ARC Prize Verified). Initially “coming soon,” it has begun rolling out to Google AI Ultra subscribers in the Gemini app. Google, Business Standard, Android Central.
  • Multimodal and length: Google cites a 1‑million‑token context window for learning across long videos/papers and strong multimodal scores (e.g., 81% on MMMU‑Pro; 87.6% on Video‑MMMU). Google.

The scoreboard (so far)

Quick snapshot — selected vendor‑reported benchmarks (not apples‑to‑apples)

Capability (benchmark)GPT‑5.2 (Thinking/Pro)Gemini 3 (Pro/Deep Think)
Abstract reasoning (ARC‑AGI‑2, Verified)52.9% / 54.2%— / 45.1% (with code execution)
Coding agents (SWE‑bench Verified)80.0%76.2%
Science QA (GPQA Diamond, no tools)92.4% / 93.2%91.9% (Pro)
Vision reasoning (MMMU‑Pro)79.5–80.4%81% (Pro)
Video understanding (Video‑MMMU)85.9%87.6% (Pro)

Sources: OpenAI, Google.

Cost and licensing at a glance

Price snapshot (as of Dec 14, 2025)

ModelInput price (/1M tok)Output price (/1M tok)Notes
OpenAI GPT‑5.2$1.75$1490% discount on cached input; rolling out across paid ChatGPT plans. OpenAI.
OpenAI GPT‑5.2 Pro$21$168Highest accuracy; supports new xhigh reasoning effort. OpenAI.
Google Gemini 3 Pro (Vertex AI)$2.00 (≤200K ctx) / $4.00 (>200K)$12 / $18Batch discounts; explicit context cache fees; Search/Web grounding billed separately after included quota. Vertex AI pricing.

A wrinkle for cost planning: When you enable Search or Web Grounding on Gemini 3 Pro in Vertex AI, Google includes 5,000 queries/month at no charge; beyond that, it’s $14 per 1,000 search queries (billing starts January 5, 2026). Token charges still apply. Vertex AI pricing.

Where automation gets better—today

The real story is less a leaderboard and more a shift in what you can reliably hand off to AI:

  • End‑to‑end artifacts: GPT‑5.2’s improved document synthesis means more “first‑drafts worth keeping” for models, decks, and code patches—especially when pair‑programming with tool use. OpenAI.
  • Live, generated interfaces: Gemini 3 can render interactive calculators and simulations directly in Search’s AI Mode, collapsing research + decision steps into one UI. Google Search.
  • Agentic development: OpenAI touts higher reliability in tool calling; Google’s Antigravity elevates agents to a first‑class surface for multi‑step coding tasks. OpenAI, Google.
TipA quick mental model for choosing
  • Live knowledge, embedded in Search and Workspace? Start with Gemini 3.
  • Deep document work, coding agents, and artifact generation? Start with GPT‑5.2 Thinking or Pro.
  • Mixed fleets? Use a router: send latency‑sensitive prompts to Gemini 3 Pro, long‑form or tool‑heavy prompts to GPT‑5.2.

How to pilot (and prove ROI) in 30 days

Competitive context—and why the stakes just rose

  • The timing wasn’t accidental. Reuters reported GPT‑5.2 arrived after a “code red” to sharpen ChatGPT amid Gemini 3’s surge and broader competition. Reuters, WIRED.
  • Google’s distribution advantage is real: Gemini 3 shipped straight into Search and the Gemini app with upgraded, agentic UI patterns. Google Search, Google.
  • OpenAI is widening its content moat: on launch day, Disney announced a $1B investment and a three‑year Sora licensing deal to let fans generate short videos with 200+ Disney/Marvel/Pixar/Star Wars characters (excluding talent likeness/voices). OpenAI–Disney, AP, Reuters.

Bottom line for operators

  • Both GPT‑5.2 and Gemini 3 are meaningfully better for automation than their predecessors. Start where each is strongest, validate with your data, and design your stack to be multi‑model from the outset.
  • Treat vendor benchmarks as directional, and measure your own: time saved, error rates, and token‑normalized cost to quality.
  • Keep an eye on safety and policy: OpenAI’s age‑prediction and safety tuning, and Google’s staged release of Deep Think, signal tighter safeguards are part of the frontier race—not an afterthought. OpenAI, Google.

Sources