Uncovering Hidden Biases in LLMs

This research introduces a systematic approach to evaluate both explicit and implicit biases in Large Language Models, moving beyond surface-level bias detection.

Leverages social psychology theories to create a comprehensive bias evaluation framework
Introduces innovative "self-reflection" techniques for LLMs to uncover their own biases
Distinguishes between conscious stereotypes (explicit bias) and unconscious associations (implicit bias)
Provides critical insights for responsible AI deployment and security compliance

Security implications are significant as understanding these nuanced biases helps prevent harmful AI outputs, reducing deployment risks and ensuring ethical AI systems that align with social values and regulatory requirements.

Explicit vs. Implicit: Investigating Social Bias in Large Language Models through Self-Reflection