Balancing Bias Mitigation & Performance in LLMs

This research introduces MOMA (Multi-Objective Multi-Agent framework), a novel approach to reducing social bias in large language models without the usual degradation in performance.

Achieves 41.7% bias reduction while maintaining model capabilities
Employs multiple specialized AI agents with distinct objectives (bias detection, task performance)
Creates a debate-style framework where agents negotiate optimal outputs
Outperforms existing prompting methods in balancing ethics and effectiveness

For security professionals, this framework offers a practical path to deploying LLMs that maintain high performance while significantly reducing potentially harmful social biases—a critical requirement for responsible AI deployment in business contexts.

Mitigating Social Bias in Large Language Models: A Multi-Objective Approach within a Multi-Agent Framework