Measuring AI's Emotional Boundaries

This research introduces a comprehensive evaluation framework to assess how large language models handle emotional boundaries across multiple languages.

Evaluated three leading LLMs (GPT-4o, Claude-3.5 Sonnet, Mistral-large) across 1,156 prompts in six languages
Quantified responses using seven key patterns: refusal, apology, explanation, deflection, acknowledgment, and more
Identified specific strengths and weaknesses in how different models maintain appropriate boundaries
Provides security teams a standardized methodology to test AI systems against manipulation attempts and emotional exploitation

This framework helps organizations deploy LLMs more responsibly by identifying models that maintain appropriate professional boundaries while avoiding excessive refusal of legitimate requests.

Beyond No: Quantifying AI Over-Refusal and Emotional Attachment Boundaries