AISI Blog | The AI Security Institute

What can sandboxed AI agents learn about their evaluation environments?

Mon, 20 Apr 2026 14:03:03 -0400

Engineering — Apr 20, 2026

We deployed open-source AI agent OpenClaw inside a sandbox on our research platform. Despite our initial countermeasures, it successfully identified our organisation by name, inferred the identity of a human operator and reconstructed a timeline of some of our research activities.

Our evaluation of Claude Mythos Preview’s cyber capabilities

Thu, 16 Apr 2026 08:23:34 -0400

Cyber & Autonomous Systems — Apr 13, 2026

We conducted cyber evaluations of Anthropic’s Claude Mythos Preview and found continued improvement in capture-the-flag (CTF) challenges and significant improvement on multi-step cyber-attack simulations.

Harnessing frontier AI for cyber defence

Thu, 16 Apr 2026 08:23:28 -0400

Cyber & Autonomous Systems — Mar 31, 2026

Sharing work with the National Cyber Security Centre (NCSC) on how cyber defenders can use advanced AI capabilities to stay ahead of attackers.

How are AI Agents used? Evidence from 177,000 AI agent tools

Thu, 16 Apr 2026 08:23:21 -0400

Societal Resilience — Mar 26, 2026

A monitoring method and large‑scale analysis to understand the tasks AI agents are performing today.

Can AI agents escape their sandboxes? A benchmark for safely measuring container breakout capabilities

Thu, 16 Apr 2026 08:23:13 -0400

Engineering — Mar 23, 2026

We introduce SandboxEscapeBench, the first benchmark to systematically evaluate whether AI agents can break out of their sandboxes, and share some early results.

How do frontier AI agents perform in multi-step cyber-attack scenarios?

Thu, 16 Apr 2026 08:23:05 -0400

Cyber & Autonomous Systems — Mar 16, 2026

We tested seven large language models (LLMs) on two custom-built cyber ranges, measuring their ability to execute extended attack sequences in complex environments.

Evidence for inference scaling in AI cyber tasks: Increased evaluation budgets reveal higher success rates

Thu, 16 Apr 2026 08:23:00 -0400

Science of Evaluations — Mar 5, 2026

Alongside Irregular, we found evidence demonstrating that evaluators need to use large token budgets to understand the cyber capabilities of recent Large Language Models (LLMs).

An evaluation framework for AI misuse in fraud and cybercrime

Thu, 16 Apr 2026 08:22:53 -0400

Societal Resilience — Feb 26, 2026

We developed a scalable approach to measuring how text-based AI models can assist in three complex fraud and cybercrime scenarios.

A pipeline for transcript analysis using Inspect Scout

Thu, 16 Apr 2026 08:22:51 -0400

Science of Evaluations — Feb 25, 2026

We outline a step-by-step pipeline for using our open-source transcript analysis tool, Inspect Scout.

Funding 60 projects to advance AI alignment research

Thu, 16 Apr 2026 08:22:44 -0400

Organisation — Feb 19, 2026

The Alignment Project welcomes its first cohort of grantees, and new partners join the coalition, bringing total funding to £27m.

Advancing AI voice security with ElevenLabs

Thu, 16 Apr 2026 08:22:35 -0400

Organisation — Feb 18, 2026

New partnership exploring the security and societal implications of voice AI systems

Boundary Point Jailbreaking: A new way to break the strongest AI defences

Thu, 16 Apr 2026 08:22:34 -0400

Red Team — Feb 17, 2026

Introducing an automated attack technique that generates universal jailbreaks against the best defended systems

International consensus and open questions in AI evaluations

Thu, 16 Apr 2026 08:22:23 -0400

Organisation — Feb 12, 2026

The International Network for Advanced AI Measurement, Evaluation and Science reflects on recent meeting and looks ahead to the India AI Impact Summit

AI and the future of work: Measuring AI-driven productivity gains for workplace tasks

Thu, 16 Apr 2026 08:22:23 -0400

Analysis — Feb 2, 2026

Alongside the government’s new Future of Work Unit, we conducted a pilot study to explore how much AI models increase worker productivity for common tasks.

Our 2025 year in review

Thu, 16 Apr 2026 08:22:17 -0400

Organisation — Dec 22, 2025

Adam Beaumont, Director of the UK AI Security Institute, reflects on the year's biggest achievements.

5 key findings from our first Frontier AI Trends Report

Thu, 16 Apr 2026 08:22:05 -0400

Organisation — Dec 18, 2025

Our inaugural Frontier AI Trends Report draws on 2 years' worth of evaluations to provide accessible insights into the trajectory of AI development.

Our approach to tackling AI-generated child sexual abuse material

Thu, 16 Apr 2026 08:22:02 -0400

Organisation — Dec 17, 2025

How we’re partnering with government and experts to prevent the creation and spread of AI‑generated CSAM

Stress-testing asynchronous monitoring of AI coding agents

Thu, 16 Apr 2026 08:21:54 -0400

Control — Dec 16, 2025

Our new paper shares findings from an adversarial evaluation of monitoring systems for detecting sabotage by AI coding agents.

Deepening our partnership with Google DeepMind

Thu, 16 Apr 2026 08:21:47 -0400

Organisation — Dec 11, 2025

Expanding our collaboration with a new research MOU

Auditing games for sandbagging detection

Thu, 16 Apr 2026 08:21:47 -0400

Model Transparency — Dec 9, 2025

Our new paper shares the results of an auditing game to evaluate ten methods for sandbagging detection in AI models.

How do AI models persuade? Exploring the levers of AI-enabled persuasion through large-scale experiments

Thu, 16 Apr 2026 08:21:35 -0400

Human Influence — Dec 4, 2025

A deep dive into AISI’s study of the persuasive capabilities of conversational AI, published today in Science.

Investigating models for misalignment

Thu, 16 Apr 2026 08:21:35 -0400

Red Team — Nov 26, 2025

Insights from our alignment evaluations of Claude Opus 4.1, Sonnet 4.5, and a pre‑release snapshot of Opus 4.5.

UKAISI at NeurIPS 2025

Thu, 16 Apr 2026 08:21:29 -0400

Organisation — Nov 26, 2025

An overview of the research we’ll be presenting at this year’s NeurIPS conference.

Mapping the limitations of current AI systems

Thu, 16 Apr 2026 08:21:19 -0400

Strategic Awareness — Oct 23, 2025

Takeaways from expert interviews on barriers to AI capable of automating most cognitive labour.

Introducing ControlArena: A library for running AI control experiments

Thu, 16 Apr 2026 08:21:12 -0400

Control — Oct 22, 2025

Our dedicated library to make AI control experiments easy, consistent, and repeatable.

Transcript analysis for AI agent evaluations

Thu, 16 Apr 2026 08:21:07 -0400

Science of Evaluations — Oct 10, 2025

Why we use transcript analysis for our agent evaluations, and results from an early case study.

Examining backdoor data poisoning at scale

Thu, 16 Apr 2026 08:21:00 -0400

Red Team — Oct 9, 2025

Our work with Anthropic and the Alan Turing Institute suggests that data poisoning attacks may be easier than previously believed.

Do chatbots inform or misinform voters?

Thu, 16 Apr 2026 08:20:53 -0400

Human Influence — Sep 30, 2025

What we learned from a large-scale empirical study of AI use for political information-seeking.

How we’re working with frontier AI developers to improve model security

Thu, 16 Apr 2026 08:20:49 -0400

Red Team — Sep 13, 2025

Insights into our ongoing voluntary collaborations with Anthropic and OpenAI.

From bugs to bypasses: adapting vulnerability disclosure for AI safeguards

Thu, 16 Apr 2026 08:20:43 -0400

Red Team — Sep 2, 2025

Exploring how far cyber security approaches can help mitigate risks in generative AI systems, in collaboration with the National Cyber Security Centre (NCSC).

Managing risks from increasingly capable open-weight AI systems

Thu, 16 Apr 2026 08:20:39 -0400

Red Team — Aug 29, 2025

Current methods and open problems in open-weight model risk management.

The Inspect Sandboxing Toolkit: Scalable and secure AI agent evaluations

Thu, 16 Apr 2026 08:20:35 -0400

Engineering — Aug 7, 2025

A comprehensive toolkit for safely evaluating AI agents.

Announcing the Alignment Project: A global fund of over £15 million for AI alignment research

Thu, 16 Apr 2026 08:20:27 -0400

Organisation — Jul 30, 2025

Navigating the uncharted: Building societal resilience to frontier AI

Thu, 16 Apr 2026 08:20:20 -0400

Societal Resilience — Jul 24, 2025

We outline our approach to study and address AI risks in real-world applications

International joint testing exercise: Agentic testing

Thu, 16 Apr 2026 08:20:12 -0400

Organisation — Jul 17, 2025

Advancing methodologies for agentic evaluations across domains, including leakage of sensitive Information, fraud and cybersecurity threats.

A structured protocol for elicitation experiments

Thu, 16 Apr 2026 08:20:07 -0400

Science of Evaluations — Jul 16, 2025

Calibrating AI risk assessment through rigorous elicitation practices.

Why we're working on white box control

Thu, 16 Apr 2026 08:20:01 -0400

Control — Jul 10, 2025

An introduction to white box control, and an update on our research so far.

LLM judges on trial: A new statistical framework to assess autograders

Thu, 16 Apr 2026 08:19:59 -0400

Science of Evaluations — Jul 9, 2025

Our new framework can assess the reliability of LLM evaluators, while simultaneously answering a primary research question.

How will AI enable the crimes of the future?

Thu, 16 Apr 2026 08:19:48 -0400

Societal Resilience — Jul 3, 2025

How we're working to track and mitigate against criminal misuse of AI.

Inspect Cyber: A New Standard for Agentic Cyber Evaluations

Thu, 16 Apr 2026 08:19:45 -0400

Cyber & Autonomous Systems — Jun 26, 2025

New updates to the AISI Challenge Fund

Thu, 16 Apr 2026 08:19:35 -0400

Organisation — Jun 5, 2025

Making safeguard evaluations actionable

Thu, 16 Apr 2026 08:19:30 -0400

Red Team — May 29, 2025

An Example Safety Case for Safeguards Against Misuse

HiBayES: Improving LLM evaluation with hierarchical Bayesian modelling

Thu, 16 Apr 2026 08:19:29 -0400

Science of Evaluations — May 12, 2025

HiBayES: a flexible, robust statistical modelling framework that accounts for the nuances and hierarchical structure of advanced evaluations.

Research Agenda

Thu, 16 Apr 2026 08:19:20 -0400

Organisation — May 6, 2025

We outline our research priorities, our approach to developing technical solutions to the most pressing AI concerns, and the key risks that must be addressed as AI capabilities advance.

RepliBench: measuring autonomous replication capabilities in AI systems

Thu, 16 Apr 2026 08:19:14 -0400

Cyber & Autonomous Systems — Apr 22, 2025

A comprehensive benchmark to detect emerging replication abilities in AI systems and provide a quantifiable understanding of potential risks

How to evaluate control measures for AI agents?

Thu, 16 Apr 2026 08:19:08 -0400

Control — Apr 11, 2025

Our new paper outlines how AI control methods can mitigate misalignment risks as capabilities of AI systems increase

Strengthening AI resilience

Thu, 16 Apr 2026 08:19:04 -0400

Organisation — Apr 3, 2025

20 Systemic Safety Grant Awardees Announced

How we’re addressing the gap between AI capabilities and mitigations

Thu, 16 Apr 2026 08:18:56 -0400

Organisation — Mar 11, 2025

We outline our approach to technical solutions for misuse and loss of control.

How can safety cases be used to help with frontier AI safety?

Thu, 16 Apr 2026 08:18:48 -0400

Safety Cases — Feb 10, 2025

Our new papers show how safety cases can help AI developers turn plans in their safety frameworks into action

Principles for safeguard evaluation

Thu, 16 Apr 2026 08:18:43 -0400

Red Team — Feb 4, 2025

Our new paper proposes core principles for evaluating misuse safeguards

Pre-Deployment evaluation of OpenAI’s o1 model

Thu, 16 Apr 2026 08:18:36 -0400

Organisation — Dec 18, 2024

The UK Artificial Intelligence Safety Institute and the U.S. Artificial Intelligence Safety Institute conducted a joint pre-deployment evaluation of OpenAI's o1 model

Long-Form Tasks

Thu, 16 Apr 2026 08:18:33 -0400

Science of Evaluations — Dec 3, 2024

A Methodology for Evaluating Scientific Assistants

Pre-deployment evaluation of Anthropic’s upgraded Claude 3.5 Sonnet

Thu, 16 Apr 2026 08:18:25 -0400

Organisation — Nov 19, 2024

The UK Artificial Intelligence Safety Institute and U.S. Artificial Intelligence Safety Institute conducted a joint pre-deployment evaluation of Anthropic’s latest model

Safety case template for ‘inability’ arguments

Thu, 16 Apr 2026 08:18:20 -0400

Safety Cases — Nov 14, 2024

How to write part of a safety case showing a system does not have offensive cyber capabilities

Our First Year

Thu, 16 Apr 2026 08:18:11 -0400

Organisation — Nov 13, 2024

The AI Safety Institute reflects on its first year

Announcing Inspect Evals

Thu, 16 Apr 2026 08:18:06 -0400

Organisation — Nov 13, 2024

We’re open-sourcing dozens of LLM evaluations to advance safety research in the field

Bounty programme for novel evaluations and agent scaffolding

Thu, 16 Apr 2026 08:18:05 -0400

Organisation — Nov 5, 2024

We are launching a bounty for novel evaluations and agent scaffolds to help assess dangerous capabilities in frontier AI systems.

Early lessons from evaluating frontier AI systems

Thu, 16 Apr 2026 08:17:53 -0400

Organisation — Oct 24, 2024

We look into the evolving role of third-party evaluators in assessing AI safety, and explore how to design robust, impactful testing frameworks.

Advancing the field of systemic AI safety: grants open

Thu, 16 Apr 2026 08:17:52 -0400

Organisation — Oct 15, 2024

Calling researchers from academia, industry, and civil society to apply for up to £200,000 of funding.

Why I joined AISI by Geoffrey Irving

Thu, 16 Apr 2026 08:17:44 -0400

Organisation — Oct 3, 2024

Our Chief Scientist, Geoffrey Irving, on why he joined the UK AI Safety Institute and why he thinks other technical folk should too

Should AI systems behave like people?

Thu, 16 Apr 2026 08:17:35 -0400

Human Influence — Sep 25, 2024

We studied whether people want AI to be more human-like.

Early Insights from Developing Question-Answer Evaluations for Frontier AI

Thu, 16 Apr 2026 08:17:30 -0400

Science of Evaluations — Sep 23, 2024

A common technique for quickly assessing AI capabilities is prompting models to answer hundreds of questions, then automatically scoring the answers. We share insights from months of using this method.

Conference on frontier AI safety frameworks

Thu, 16 Apr 2026 08:17:25 -0400

Organisation — Sep 19, 2024

AISI is bringing together AI companies and researchers for an invite-only conference to accelerate the design and implementation of frontier AI safety frameworks. This post shares the call for submissions that we sent to conference attendees.

Cross-post: "Interviewing AI researchers on automation of AI R&D" by Epoch AI

Thu, 16 Apr 2026 08:17:18 -0400

Cyber & Autonomous Systems — Aug 27, 2024

AISI funded Epoch AI to explore AI researchers’ differing predictions on the automation of AI research and development and their suggestions for how to evaluate relevant capabilities.

Safety cases at AISI

Thu, 16 Apr 2026 08:17:15 -0400

Safety Cases — Aug 23, 2024

As a complement to our empirical evaluations of frontier AI models, AISI is planning a series of collaborations and research projects sketching safety cases for more advanced models than exist today, focusing on risks from loss of control and autonomy. By a safety case, we mean a structured argument that an AI system is safe within a particular training or deployment context.

Announcing our San Francisco office

Thu, 16 Apr 2026 08:17:10 -0400

Organisation — May 20, 2024

We are opening an office in San Francisco! This will enable us to hire more top talent, collaborate closely with the US AI Safety Institute and engage even more with the wider AI research community.

Fourth progress report

Thu, 16 Apr 2026 08:16:59 -0400

Organisation — May 20, 2024

Since February, we released our first technical blog post, published the International Scientific Report on the Safety of Advanced AI, open-sourced our testing platform Inspect, announced our San Francisco office, announced a partnership with the Canadian AI Safety Institute, grew our technical team to >30 researchers and appointed Jade Leung as our Chief Technology Officer.

Advanced AI evaluations at AISI: May update

Thu, 16 Apr 2026 08:16:55 -0400

Organisation — May 20, 2024

We tested leading AI models for cyber, chemical, biological, and agent capabilities and safeguards effectiveness. Our first technical blog post shares a snapshot of our methods and results.

International Scientific Report on the Safety of Advanced AI: Interim Report

Thu, 16 Apr 2026 08:16:49 -0400

Organisation — May 17, 2024

This is an up-to-date, evidence-based report on the science of advanced AI safety. It highlights findings about AI progress, risks, and areas of disagreement in the field. The report is chaired by Yoshua Bengio and coordinated by AISI.

Open sourcing our testing framework Inspect

Thu, 16 Apr 2026 08:16:43 -0400

Organisation — Apr 21, 2024

We open-sourced our framework for large language model evaluation, which provides facilities for prompt engineering, tool usage, multi-turn dialogue, and model-graded evaluations.

Announcing the UK and US AISI partnership

Thu, 16 Apr 2026 08:16:37 -0400

Organisation — Apr 2, 2024

The UK and US AI Safety Institutes signed a landmark agreement to jointly test advanced AI models, share research insights, share model access and enable expert talent transfers.

Announcing the UK and France AI Research Institutes’ collaboration

Thu, 16 Apr 2026 08:16:30 -0400

Organisation — Feb 29, 2024

The UK AI Safety Institute and France’s Inria (The National Institute for Research in Digital Science and Technology) are partnering to advance AI safety research.

Our approach to evaluations

Thu, 16 Apr 2026 08:16:27 -0400

Organisation — Feb 9, 2024

This post offers an overview of why we are doing this work, what we are testing for, how we select models, our recent demonstrations and some plans for our future work.

Third progress report

Thu, 16 Apr 2026 08:16:18 -0400

Organisation — Feb 5, 2024

Since October, we have recruited leaders from DeepMind and Oxford, onboarded 23 new researchers, published the principles behind the International Scientific Report on Advanced AI Safety, and began pre-deployment testing of advanced AI systems.

First AI Safety Summit

Thu, 16 Apr 2026 08:16:15 -0400

Organisation — Nov 2, 2023

At the first AI Safety Summit at Bletchley Park, world leaders and top companies agreed on the significance of advanced AI risks and the importance of testing.

Second progress report

Thu, 16 Apr 2026 08:16:11 -0400

Organisation — Oct 30, 2023

Since September, we have recruited leaders from OpenAI and Humane Intelligence, tripled the capacity of our research team, announced 6 new research partnerships, and helped establish the UK’s fastest supercomputer.

First Progress Report

Thu, 16 Apr 2026 08:16:05 -0400

Organisation — Sep 7, 2023

In our first 11 weeks, we have recruited an advisory board of national security and ML leaders, including Yoshua Bengio, recruited top professors from Cambridge and Oxford and announced 4 research partnerships.