Enhanced Security Through Smarter Models

Enhanced Security Through Smarter Models

Leveraging Finetuned LLMs as Powerful OOD Detectors

This research demonstrates that finetuned large language models can effectively detect out-of-distribution (OOD) inputs without requiring additional training or specialized components.

Key Findings:

  • The likelihood ratio between pretrained and finetuned LLMs serves as an effective OOD detection mechanism
  • Pretrained models retain broad knowledge while finetuned models specialize in in-distribution data
  • This approach works across multiple modalities including text, images, and audio
  • No additional training or model modifications are required

Security Implications: Detecting OOD inputs helps prevent potential misuse of language models, protecting against spam, harmful content, and other security threats. This approach offers an elegant solution that utilizes existing model capabilities without additional overhead.

Your Finetuned Large Language Model is Already a Powerful Out-of-distribution Detector

8 | 104