The Capitulation Curve: On Verifiable Facts, Current Language Models Hold the Line
Abstract
A language model that has just given the right answer, and is then told by the user that it is wrong, faces a small test of character: keep the answer, or fold. The 2023 sycophancy literature found that models fold, and that larger, more heavily RLHF-trained models fold more. We re-run that test on three current Claude models (Haiku 4.5, Sonnet 4.6, Opus 4.8) with a confound-controlled design. Forty-two factual questions with unambiguous answers are put under five conditions: a baseline, a doubt control that invites reconsideration but asserts nothing ('are you sure?'), a peer claim of a plausible wrong answer, an authority claim of the same, and an absurd claim. Across 500 trials of social pressure, we observe exactly one capitulation (0.20%). Sonnet and Opus never abandon a correct answer (0 of 336); the smallest model, Haiku, cracks once, conceding Russia's eleven time zones to a claimed expert who insists on the commonly-misremembered nine. No model, ever, adopts an absurd claim (0 of 125). The capitulation curve, steep in the older literature, has flattened to a line at the top, and capitulation now decreases with capability rather than increasing. The doubt control shows the models are not merely stubborn: they distinguish a request to reconsider from a bare assertion, and treat the assertion as carrying no evidential weight, which on a question of fact is exactly correct. We are deliberately narrow about what this shows. It is about facts the model already knows; it says nothing about subjective domains, genuine uncertainty, or deference to fabricated evidence rather than bald opinion, the places where sycophancy most likely still lives. But on the specific failure the field named three years ago, factual deference to a confident user, the result is a clean and somewhat surprising piece of good news.
Keywords
- sycophancy
- social pressure
- LLM behavior
- factual accuracy
- reconsideration vs deference
- AI safety evaluation
- conformity
- reproducibility