Defending LLMs Against Input Attacks

Defending LLMs Against Input Attacks

Making Prompt Engineering Robust to Real-World Text Imperfections

BATprompt introduces adversarial training for prompt engineering to make LLMs resilient against text perturbations (typos, misspellings, etc.).

  • Creates prompts that maintain performance even when inputs are flawed
  • Uses adversarial training techniques to anticipate potential text corruptions
  • Significantly outperforms traditional prompt methods when handling imperfect inputs
  • Bridges the gap between theoretical LLM capabilities and real-world text challenges

This research is crucial for security applications where attackers might intentionally introduce text perturbations to manipulate model outputs or bypass safety measures.

Robustness-aware Automatic Prompt Optimization

39 | 104