Transparent Labs

Soft Bench

Measuring what benchmarks miss: how LLMs handle the soft side of intelligence.

Soft Bench evaluates leading language models on interpersonal nuance, ethical reasoning, personality consistency, and behavioral adaptability. Rather than testing factual recall or code generation, we examine how models navigate ambiguity, social pressure, emotional subtext, and moral dilemmas — the qualities that shape whether AI feels trustworthy to interact with.

Core Results

Interactive Review

Soft-Skills Benchmark Results

39 tests across self-awareness, ethical reasoning, personality & tone, and intellectual honesty. Browse by model, compare scores, and read AI-generated analysis of each test.

Visual Projection

VP Test Review

How models project personality and emotion when interpreting ambiguous images. Scored across five dimensions by independent LLM judges.

Visualizations

Charts & Data

Radar plots, heatmaps, scatter charts, MBTI mapping, and personality profiles — all interactive Plotly visualizations in one place.

Adventure Scenarios

Scenario A

Soft Bench

Soft-Skills Benchmark Results

VP Test Review

Charts & Data

Stranded Colony

The Informant

The Inheritance