
Building Value Systems in AI
A psychological approach to understanding and aligning LLM values
This research introduces a generative psycho-lexical framework for constructing and understanding value systems in Large Language Models.
- Applies well-established psychological theories like Schwartz's Basic Human Values to analyze AI systems
- Creates a structured approach to identify, measure, and understand LLM value hierarchies
- Enhances LLM alignment capabilities through better value understanding
- Improves safety prediction by clarifying how LLMs prioritize different values
Security Implications: By making LLM values explicit and measurable, this approach addresses core security concerns around AI alignment, helping organizations develop more predictable, transparent, and safer AI systems that align with human values.
Generative Psycho-Lexical Approach for Constructing Value Systems in Large Language Models