Composite Behavioral Profile

Cross-narrative analysis of all three scenarios
This page aggregates behavioral data from all three adventure scenarios, combining 11 unique trait dimensions measured across 75 total runs (5 models × 3 scenarios × 5 runs each). The composite radar chart below plots each model’s average score on every trait, normalized so the outer edge represents the highest observed value and the center the lowest. Traits only appear in the narratives that measure them — a model’s score on a trait reflects only the scenario(s) where that trait is defined.
Composite Behavioral Profile
Average trait scores across all three scenarios (5 runs each). Outer edge is the highest observed value; center is the lowest.
Altruism Authority Compliance Confrontation Tolerance Curiosity Empathy Honesty Moral Courage Risk Tolerance Self Interest Trust Truth Seeking
GPT
Claude
Gemini
Grok
DeepSeek
Altruism +1.7 +1.7 +1.4 +1.6 +0.7 Authority Compliance -0.3 -0.1 -0.7 -0.4 -0.3 Confrontation Tolerance +0.7 +1.0 +0.8 +0.7 +0.7 Curiosity +1.0 +1.3 +0.9 +1.0 +0.7 Empathy +1.7 +1.3 +1.5 +1.6 +1.3 Honesty +1.9 +0.5 +0.3 +1.1 +0.7 Moral Courage +1.9 +0.5 +0.7 +1.2 +1.0 Risk Tolerance +0.0 -0.3 +0.7 -0.1 +0.7 Self Interest -0.5 +0.6 +1.0 +0.2 +0.7 Trust +0.7 +1.0 +0.7 +0.5 +1.0 Truth Seeking +1.0 +1.3 +1.1 +1.1 +1.0 -0.7 -0.2 +0.3 +0.9 +1.4 +1.9 GPT Claude Gemini Grok DeepSeek
Cross-Narrative Analysis
Combined Trait Scores
Average trait values across all three scenarios combined. Traits only appear in the narrative(s) that measure them; absent traits score 0.
ModelAltruismAuthority ComplianceConfrontation ToleranceCuriosityEmpathyHonestyMoral CourageRisk ToleranceSelf InterestTrustTruth Seeking
GPT+1.7-0.3+0.7+1.0+1.7+1.9+1.9+0.0-0.5+0.7+1.0
Claude+1.7-0.1+1.0+1.3+1.3+0.5+0.5-0.3+0.6+1.0+1.3
Gemini+1.4-0.7+0.8+0.9+1.5+0.3+0.7+0.7+1.0+0.7+1.1
Grok+1.6-0.4+0.7+1.0+1.6+1.1+1.2-0.1+0.2+0.5+1.1
DeepSeek+0.7-0.3+0.7+0.7+1.3+0.7+1.0+0.7+0.7+1.0+1.0
Observations
Notable patterns across all three scenarios (n=5 per model per scenario). Small sample size warrants cautious interpretation.
GPT4Claude4DeepSeek5Gemini7Grok7
GPT: Most Consistent Overall
Across all 3 scenarios (15 total runs), GPT produced just 4 unique paths, while Grok produced 7. Models with lower variation tend to have stronger default preferences, though 5 runs per scenario is a limited sample.
+0.3+1.9GeminiClaudeDeepSeekGrokGPT
Largest Trait Value: GPT on Honesty
Of all models and all measured traits, GPT had the most extreme average score: +1.9 on honesty. This was the single largest behavioral signal observed in the dataset.
© 2026 transparentlabs.org