Enhanced Security Through Smarter Models

This research demonstrates that finetuned large language models can effectively detect out-of-distribution (OOD) inputs without requiring additional training or specialized components.

Key Findings:

The likelihood ratio between pretrained and finetuned LLMs serves as an effective OOD detection mechanism
Pretrained models retain broad knowledge while finetuned models specialize in in-distribution data
This approach works across multiple modalities including text, images, and audio
No additional training or model modifications are required

Security Implications: Detecting OOD inputs helps prevent potential misuse of language models, protecting against spam, harmful content, and other security threats. This approach offers an elegant solution that utilizes existing model capabilities without additional overhead.

Your Finetuned Large Language Model is Already a Powerful Out-of-distribution Detector