
Hidden Threats in Language Models
Cross-lingual backdoor attacks that evade detection
This research identifies a novel cross-lingual backdoor attack that compromises LLMs by injecting triggers that are difficult to detect yet highly effective.
- Introduces CL-Attack method using translated text segments as stealthy backdoor triggers
- Demonstrates how cross-lingual triggers avoid detection by common defense mechanisms
- Proposes TranslateDefense as a countermeasure against these sophisticated attacks
- Highlights critical security implications for multilingual AI applications
This research exposes significant security vulnerabilities in large language models that could be exploited to manipulate model outputs without detection, raising important concerns for organizations deploying LLMs in sensitive contexts.
CL-Attack: Textual Backdoor Attacks via Cross-Lingual Triggers