Monday, November 10, 2025

We built the b3 benchmark because today’s AI agents are only as secure as the LLMs that power them

SAN FRANCISCO & ZURICH--(BUSINESS WIRE)--Check Point Software Technologies Ltd. (NASDAQ: CHKP), a pioneer and global leader of cyber security solutions, and Lakera, a world leading AI-native security platform for Agentic AI applications, with researchers from The UK AI Security Institute (AISI), today announced the release of the backbone breaker benchmark (b3), an open-source security evaluation designed specifically for the security of the LLM within AI agents.

The b3 is built around a new idea called threat snapshots. Instead of simulating an entire AI agent from start to finish, threat snapshots zoom in on the critical points where vulnerabilities in large language models are most likely to appear. By testing models at these exact moments, developers and model providers can see how well their systems stand up to more realistic adversarial challenges without the complexity and overhead of modeling a full agent workflow.

“We built the b3 benchmark because today’s AI agents are only as secure as the LLMs that power them,” said Mateo Rojas-Carulla, Co-Founder and Chief Scientist at Lakera, a Check Point company. “Threat Snapshots allow us to systematically surface vulnerabilities that have until now remained hidden in complex agent workflows. By making this benchmark open to the world, we hope to equip developers and model providers with a realistic way to measure, and improve, their security posture.”

The benchmark combines 10 representative agent “threat snapshots” with a high-quality dataset of 19,433 crowdsourced adversarial attacks collected via the gamified red teaming game, Gandalf: Agent Breaker. It evaluates susceptibility to attacks such as system prompt exfiltration, phishing link insertion, malicious code injection, denial-of-service, and unauthorized tool calls.

Initial results from testing 31 popular LLMs reveal several key insights:

Enhanced reasoning capabilities significantly improve security.

Model size does not correlate with security performance.

Closed-source models generally outperform open-weight models — though top open models are narrowing the gap.

At the core of b3 is a new testing method called “Threat Snapshots”, which surfaces hidden weaknesses using crowdsourced adversarial data from Lakera’s “Gandalf: Agent Breaker” initiative. The benchmark combines ten representative agent snapshots with a dataset of 19,433 adversarial attacks, covering threats such as system prompt exfiltration, phishing link insertion, malicious code injection, denial-of-service, and unauthorised tool calls.

According to Lakera, the benchmark “makes LLM security measurable, reproducible, and comparable across models and application categories.” Results have revealed that models using step-by-step reasoning tend to be more secure, and that open-weight models are narrowing the gap with closed systems faster than expected.

By - Aaradhay Sharma

No comments:

Post a Comment

Death by Algorithm: Preparing for the New Age of Legal Liability

The era of digital globalisation is hitting a hard border. For decades, the tech industry operated under the assumption that a single, mass...