
Boosting LLM Defense Without Retraining
How more compute time creates stronger shields against adversarial attacks
This research demonstrates that increasing inference-time compute can significantly improve large language models' resistance to adversarial attacks, without requiring specialized adversarial training.
- Models (OpenAI o1-preview and o1-mini) show improved robustness when given more processing time
- In many scenarios, attack success rates approach zero as compute resources increase
- Different types of attacks show varying susceptibility to this defense mechanism
- Results suggest a practical security approach: allocate more compute resources when detecting potential attacks
For security teams, this presents a compelling trade-off between computational resources and model security, offering a way to strengthen AI systems against malicious inputs without expensive retraining cycles.