Defending LLMs Against Input Attacks

BATprompt introduces adversarial training for prompt engineering to make LLMs resilient against text perturbations (typos, misspellings, etc.).

Creates prompts that maintain performance even when inputs are flawed
Uses adversarial training techniques to anticipate potential text corruptions
Significantly outperforms traditional prompt methods when handling imperfect inputs
Bridges the gap between theoretical LLM capabilities and real-world text challenges

This research is crucial for security applications where attackers might intentionally introduce text perturbations to manipulate model outputs or bypass safety measures.

Robustness-aware Automatic Prompt Optimization