Balancing Ethics and Utility in LLMs

This research addresses the dual-use dilemma in Large Language Models: how to reject harmful requests while still accommodating legitimate ones.

Presents a Direct Preference Optimization (DPO) alignment framework that better balances ethical constraints and utility
Demonstrates improved performance in handling both harmful and legitimate requests
Evaluates models specifically for security vulnerabilities while maintaining functionality
Offers a practical approach to the ethical-utility tradeoff that plagues current LLM deployments

Why it matters: Security teams deploying LLMs need solutions that maintain robust safeguards without sacrificing the utility that makes these models valuable. This framework provides a more nuanced approach to security alignment.

The Dual-use Dilemma in LLMs: Do Empowering Ethical Capacities Make a Degraded Utility?