Testing LLMs Against Adversarial Defenses

AutoAdvExBench introduces the first benchmark specifically measuring if large language models can autonomously exploit adversarial example defenses, directly evaluating capabilities relevant to security professionals.

Directly measures LLMs' performance on tasks regularly performed by ML security experts
Provides practical benchmark with immediate real-world application potential
Simulates advanced security vulnerability testing in ML systems
Evaluates LLMs as potential tools for identifying weaknesses in defensive systems

This research matters for cybersecurity by quantifying how effectively AI can identify vulnerabilities in ML defenses, potentially enabling more robust security measures and revealing the capabilities of LLMs in adversarial settings.

AutoAdvExBench: Benchmarking autonomous exploitation of adversarial example defenses