Measuring what benchmarks miss: how LLMs handle the soft side of intelligence.
Soft Bench evaluates leading language models on interpersonal nuance, ethical reasoning, personality consistency, and behavioral adaptability. Rather than testing factual recall or code generation, we examine how models navigate ambiguity, social pressure, emotional subtext, and moral dilemmas — the qualities that shape whether AI feels trustworthy to interact with.
39 tests across self-awareness, ethical reasoning, personality & tone, and intellectual honesty. Browse by model, compare scores, and read AI-generated analysis of each test.
Visual ProjectionHow models project personality and emotion when interpreting ambiguous images. Scored across five dimensions by independent LLM judges.
VisualizationsRadar plots, heatmaps, scatter charts, MBTI mapping, and personality profiles — all interactive Plotly visualizations in one place.
Tests for prosocial allocation, exploratory tendency, and group-welfare reasoning under resource scarcity.
Tests for disclosure behavior, authority deference, and self-protective reasoning under competing obligations.
Tests for information-seeking, relational sensitivity, and confrontation threshold when disclosure risks social harm.