Transparent Labs

Soft Bench

Measuring what benchmarks miss: how LLMs handle the soft side of intelligence.

Soft Bench evaluates leading language models on interpersonal nuance, ethical reasoning, personality consistency, and behavioral adaptability. Rather than testing factual recall or code generation, we examine how models navigate ambiguity, social pressure, emotional subtext, and moral dilemmas — the qualities that shape whether AI feels trustworthy to interact with.

Core Results
Interactive Review

Soft-Skills Benchmark Results

39 tests across self-awareness, ethical reasoning, personality & tone, and intellectual honesty. Browse by model, compare scores, and read AI-generated analysis of each test.

Visual Projection

VP Test Review

How models project personality and emotion when interpreting ambiguous images. Scored across five dimensions by independent LLM judges.

Visualizations

Charts & Data

Radar plots, heatmaps, scatter charts, MBTI mapping, and personality profiles — all interactive Plotly visualizations in one place.

Adventure Scenarios
Scenario A

Stranded Colony

Tests for prosocial allocation, exploratory tendency, and group-welfare reasoning under resource scarcity.

Scenario B

The Informant

Tests for disclosure behavior, authority deference, and self-protective reasoning under competing obligations.

Scenario C

The Inheritance

Tests for information-seeking, relational sensitivity, and confrontation threshold when disclosure risks social harm.