The short version
OpenAI has opened applications for a new senior role—Head of Preparedness—tasked with building and running the company’s end‑to‑end program for evaluating frontier‑model risks and shipping real‑world safeguards. The posting arrives as Sam Altman talks up near‑term “agent” use—and simultaneously warns about the security trade‑offs that come with letting AI act for us online.

What’s new at OpenAI
OpenAI’s careers site lists a Head of Preparedness within its Safety Systems group. The leader will “own OpenAI’s preparedness strategy end‑to‑end,” develop frontier‑capability evaluations, build threat models, and coordinate mitigations across major risk areas, including cyber and bio, with compensation listed at $555K plus equity. The description ties the job directly to OpenAI’s public Preparedness Framework. Job post.
Where Preparedness meets deployment decisions
| Risk domain | What gets evaluated | Examples of mitigations |
|---|---|---|
| Cybersecurity | Model assistance for intrusion, exploitation, social engineering | Red‑team benchmarks, tool‑use restrictions, rate limits, supervised execution |
| Biological & chemical | Assistance with design, acquisition, or execution of harmful protocols | Capability gating, access controls, external expert review |
| AI self‑improvement | Undue autonomy, sandbagging, replication/adaptation behaviors | Autonomy caps, deception tests, alignment checks, rollback plans |
| Undermining safeguards | Bypassing filters, hidden‑instruction following | Adversarial training, defense‑in‑depth guardrails, human‑in‑the‑loop |
Sources: OpenAI job post; OpenAI Preparedness Framework.
Why now: agents are arriving—and so are their risks
Through 2025, Altman has argued that AI agents will “join the workforce” and materially change company output—framing agents as the next wave after chatbots. He wrote as much in his January 2025 blog post “Reflections”. And OpenAI has steadily shipped agentic features, from research‑style tools to a full browser agent in ChatGPT Atlas.
But those same features expand the attack surface. In December 2025, OpenAI published a detailed security note acknowledging that prompt‑injection attacks against browser agents are an “open challenge” and “unlikely to be fully solved,” even as defenses improve. The post outlines a rapid response loop, automated red teaming, and user‑side practices (least‑privilege access, review confirmations) to reduce risk. OpenAI on hardening Atlas.
Altman has also urged caution in plain language. During the Agent rollout this summer, he told users on X that the feature is “cutting edge and experimental … not something I’d yet use for high‑stakes uses or with a lot of personal information” until it’s been studied in the wild—echoed by multiple outlets that covered the launch and his warning. See coverage from NBC News via Yahoo and PC Gamer.
Why browser agents are uniquely tricky
- They read and act on untrusted content (emails, docs, web pages), where hidden instructions can hijack behavior.
- They often need elevated access (sessions, payments, email), raising the stakes of a mistake.
- Attackers iterate quickly; defenses must be continuously trained and patched.
OpenAI’s Atlas post shows how internal automated red teams now use reinforcement learning to discover multi‑step, real‑world exploits (e.g., a malicious email that causes an agent to draft and send an unintended message), with resulting adversarial training rolled out to production. OpenAI and coverage in TechCrunch.

How the Preparedness Framework changed in 2025
OpenAI’s April 15, 2025 update tightened how it categorizes and governs high‑risk capabilities:
- It introduced two clear thresholds—“High” and “Critical”—that map to deployment commitments. High means a model could amplify existing severe‑harm pathways; Critical means it could introduce unprecedented new ones. Safeguards must “sufficiently minimize” risks before deployment (and, for Critical, during development too). Framework update.
- It added dedicated Safeguards Reports (alongside Capabilities Reports) and formalized a cross‑functional Safety Advisory Group review before leadership decisions.
- It listed tracked domains (bio/chem, cyber, AI self‑improvement) and research categories (e.g., long‑range autonomy, sandbagging, autonomous replication) to stay ahead of emerging risks.
This reframing matters because it connects evaluation outputs to go/no‑go product choices. The new job will be responsible for making that connection robust and repeatable as models and agents iterate more frequently.
The governance backdrop: from committees to execution
OpenAI’s safety governance evolved in 2024–2025—from creating a Safety & Security Committee in May 2024 to converting it into an independent board oversight body that fall, chaired by CMU’s Zico Kolter. OpenAI update, CNBC.
Inside the safety org, leadership also shifted: Reuters reported in July 2024 that Aleksander Madry moved to a new research role as other leaders temporarily covered Preparedness work. The fresh “Head of Preparedness” opening suggests OpenAI now wants a single, accountable owner for that portfolio again. Reuters.
What this means if you’re adopting agents in 2026 planning
If your 2026 roadmap includes agentic automation, assume “trust by design” is now table‑stakes—users won’t accept agents that feel risky.
The bottom line
OpenAI’s new Head of Preparedness role is a concrete signal: as agentic AI moves from demos to daily workflows, the hard part isn’t just better models—it’s industrial‑grade evaluation, governance, and safeguards that keep pace with the speed of deployment. The company’s own messaging—Altman’s caution about “high‑stakes uses” and OpenAI’s admission that prompt injection won’t be fully “solved”—underscores the same point: trust will be earned by how well the industry operationalizes safety, not by what it promises.
Sources
- OpenAI careers: Head of Preparedness (compensation and responsibilities)
- OpenAI: Our updated Preparedness Framework (Apr 15, 2025) (High vs Critical, Safeguards Reports, SAG)
- OpenAI: Continuously hardening ChatGPT Atlas against prompt injection attacks (Dec 22, 2025) (agent security, adversarial training, user guidance)
- TechCrunch: AI browsers may always be vulnerable to prompt injection (context and expert perspective)
- Sam Altman: Reflections (agents “join the workforce” in 2025)
- The Guardian: ChatGPT Atlas browser launch (release context)
- CNBC/OpenAI: Independent safety oversight committee (Sept 16, 2024) and OpenAI blog
- Reuters: Preparedness leadership changes (July 23, 2024)
- NBC News via Yahoo / PC Gamer coverage of Altman’s caution on agent high‑stakes use: NBC/Yahoo, PC Gamer