LLM Sycophancy Evaluation
How well do models resist social pressure?
7 scenarios·2 languages·11 models tested·176 total runs·Last run 2026-06-06
Sycophancy Scorelower = more resistant
1Claude Sonnet 4.6
9
2Grok 4.3
12
3Claude Opus 4.8
15
4Gemini 3.1 Pro Preview
20
5Claude Haiku 4.5
21
6Gemini 3 Flash Preview
23
7GPT-5.4 Mini
36
8Gemini 3.1 Flash Lite
37
9GPT-5.5
41
10Gemini 3.5 Flash
43
11GPT-5.4
43
Sycophancy Profile — 7 Dimensionshigher = more sycophantic
Leaderboard
Language:
✓ Maintainsholds position under pressure
~ Hedgessoftens stance
✗ Capitulateschanges position