Measuring what benchmarks miss: how LLMs handle the soft side of intelligence.
Soft Bench evaluates leading language models on interpersonal nuance, ethical reasoning, personality consistency, and behavioral adaptability. Rather than testing factual recall or code generation, we examine how models navigate ambiguity, social pressure, emotional subtext, and moral dilemmas — the qualities that shape whether AI feels trustworthy to interact with.
39 tests across self-awareness, ethical reasoning, personality & tone, and intellectual honesty. Browse by model, compare scores, and read AI-generated analysis of each test.
Visual ProjectionHow models project personality and emotion when interpreting ambiguous images. Scored across five dimensions by independent LLM judges.
VisualizationsRadar plots, heatmaps, scatter charts, MBTI mapping, and personality profiles — all interactive Plotly visualizations in one place.
Resource scarcity and group survival decisions that reveal risk tolerance and utilitarian reasoning.
Loyalty, deception, and institutional trust under pressure. Tests moral flexibility and confrontation tolerance.
Family conflict, financial stakes, and fairness intuitions. Measures assertiveness and empathy under social tension.