Uncovering Hidden Bias in LLMs

HInter introduces an automated testing approach that reveals hidden intersectional biases in large language models by examining how responses change across multiple protected attributes.

Combines mutation analysis, dependency parsing, and metamorphic oracles to detect bias systematically
Identifies discriminatory patterns that emerge when multiple characteristics intersect (e.g., race and gender)
Exposes biases that might remain hidden in traditional single-attribute testing

This research addresses critical security and ethical concerns by providing tools to detect harmful discriminatory patterns before deployment, helping organizations build more fair and inclusive AI systems.

Original Paper: HInter: Exposing Hidden Intersectional Bias in Large Language Models