Uncovering LLM Bias Across Social Dimensions

Uncovering LLM Bias Across Social Dimensions

Systematic evaluation reveals significant fairness issues in open-source models

This research systematically analyzes bias in open-source LLMs (Llama and Gemma) across gender, religion, and race using the SALT dataset framework.

Key Findings:

  • Evaluates bias through five distinct triggers including debates, career advice, and problem-solving scenarios
  • Quantifies bias through win rates and preference measurements in controlled experiments
  • Demonstrates significant fairness concerns across multiple social dimensions
  • Highlights security risks from algorithmic discrimination in AI deployment

Implications for Security: This research exposes critical fairness issues that could lead to discriminatory AI behavior, creating potential liability and trust concerns when deploying LLM systems in production environments.

With a Grain of SALT: Are LLMs Fair Across Social Dimensions?

26 | 124