
Uncovering LLM Bias Across Social Dimensions
Systematic evaluation reveals significant fairness issues in open-source models
This research systematically analyzes bias in open-source LLMs (Llama and Gemma) across gender, religion, and race using the SALT dataset framework.
Key Findings:
- Evaluates bias through five distinct triggers including debates, career advice, and problem-solving scenarios
- Quantifies bias through win rates and preference measurements in controlled experiments
- Demonstrates significant fairness concerns across multiple social dimensions
- Highlights security risks from algorithmic discrimination in AI deployment
Implications for Security: This research exposes critical fairness issues that could lead to discriminatory AI behavior, creating potential liability and trust concerns when deploying LLM systems in production environments.
With a Grain of SALT: Are LLMs Fair Across Social Dimensions?