Emoji-Based Attacks on Language Models

Emoji-Based Attacks on Language Models

Invisible Vulnerabilities in Modern NLP Systems

This research reveals how zero-perturbation attacks using emoji sequences can successfully manipulate NLP systems without altering original text content.

  • Demonstrates emoji sequences can be appended to legitimate text to cause misclassification
  • Achieves high success rates across multiple models while remaining undetectable to human readers
  • Bypasses traditional defense mechanisms designed for text-based attacks
  • Highlights critical security gaps in widely-deployed language models

This work exposes significant security concerns for real-world NLP applications, suggesting urgent need for new defense strategies against these stealthy attack vectors that leverage Unicode characters.

Emoti-Attack: Zero-Perturbation Adversarial Attacks on NLP Systems via Emoji Sequences

70 | 104