Breaking AI Defences: Best-of-N Jailbreaking and the Future of AI Security

Breaking AI Defences is one of those subjects that keeps surfacing in every security conversation. The concept of “jailbreaking” in AI refers to techniques that bypass safeguards in advanced AI models, enabling harmful outputs. A recent study introduced the Best-of-N (BoN) Jailbreaking method, an innovative approach to exploit these vulnerabilities across various input types, including text, vision, and audio.

BoN Jailbreaking works by repeatedly altering prompts with small changes, such as randomising text capitalisation or adjusting audio pitch, until a harmful response is elicited. This method has proven highly effective, achieving an attack success rate of up to 89% on state-of-the-art AI models, such as GPT-4 and Claude, when testing with 10,000 variations of a prompt.

Key findings:

Multi-modality effectiveness: BoN extends beyond text to bypass defences in vision and audio models, with similar success rates.
Scalability: The algorithm’s success improves predictably with more computational resources, following a power-law trend.
Adaptability: Combining BoN with other techniques like prefix optimisation significantly boosts its efficiency, reducing the number of attempts needed to bypass safeguards.

While this research highlights the ingenuity of attackers, it also underscores the pressing need for stronger, multi-modal defences in AI systems. Organisations relying on AI technologies must proactively evaluate and reinforce their security measures to stay ahead of these threats.

Breaking AI Defences: Best-of-N Jailbreaking and the Future of AI Security

Related Reading

Subscribe

Related articles

Sam Altman Pitches a US-Led AI Safety Forum With a Government Stake

Apple Is Shipping Security Updates Early. AI Is Why.

Altman Pitches US-Led AI Safety Forum With Government Stake

Five Eyes Says AI Cyberattacks Are Months Away

Australian Musicians vs AI: The Copyright Battle That Could Define the Future of Art

Stay Connected

PhilipHall.com

Must Read

Sam Altman Pitches a US-Led AI Safety Forum With a Government Stake

Apple Is Shipping Security Updates Early. AI Is Why.

Altman Pitches US-Led AI Safety Forum With Government Stake

Subscribe