Breaking AI Defences: Best-of-N Jailbreaking and the Future of AI Security

The concept of “jailbreaking” in AI refers to techniques that bypass safeguards in advanced AI models, enabling harmful outputs. A recent study introduced the Best-of-N (BoN) Jailbreaking method, an innovative approach to exploit these vulnerabilities across various input types, including text, vision, and audio.

BoN Jailbreaking works by repeatedly altering prompts with small changes, such as randomising text capitalisation or adjusting audio pitch, until a harmful response is elicited. This method has proven highly effective, achieving an attack success rate of up to 89% on state-of-the-art AI models, such as GPT-4 and Claude, when testing with 10,000 variations of a prompt.

Key findings:

  • Multi-modality effectiveness: BoN extends beyond text to bypass defences in vision and audio models, with similar success rates.
  • Scalability: The algorithm’s success improves predictably with more computational resources, following a power-law trend.
  • Adaptability: Combining BoN with other techniques like prefix optimisation significantly boosts its efficiency, reducing the number of attempts needed to bypass safeguards.

While this research highlights the ingenuity of attackers, it also underscores the pressing need for stronger, multi-modal defences in AI systems. Organisations relying on AI technologies must proactively evaluate and reinforce their security measures to stay ahead of these threats.

 

AI Tools
AI Tools

Read more from the original PDF here.

By understanding and addressing vulnerabilities like those exposed by BoN Jailbreaking, we can enhance the resilience and safety of AI technologies in an increasingly interconnected world.

If you have any related ideas, comments or views, feel free to share in the comments section below.

Related Reading

Subscribe

Related articles

AI Just Discovered 21 Zero-Days in FFmpeg. That Changes Everything.

An autonomous AI agent found 21 zero-day vulnerabilities in FFmpeg while Chrome patched a record 429 bugs. Here is what that means for every team shipping software today.

Shadow AI Is quietly making every data breach more expensive

Unmonitored AI tools are adding hundreds of thousands of dollars to breach costs. Here is what is actually happening and why your governance gap is the problem.

Software Flaws Are Now the #1 Breach Cause, and AI Is Making It Worse

The 2026 Verizon report puts software flaws ahead of stolen credentials as the top breach cause, with AI accelerating every stage of the attack chain.

Anthropic, OpenAI and the race to weaponise AI against insecurity

Anthropic’s Glasswing expansion and OpenAI’s internal drama show AI is being sold as defence, but the threat landscape is moving faster than the governance.

ChatGPhish: How ChatGPT Turned Into a Phishing Machine

ChatGPhish shows how attackers can turn AI-generated web summaries into a phishing surface. Here’s why your team needs to treat AI links like untrusted content.