Two findings have been circling each other for two years. Doshi and Hauser (2024) showed that writers given AI ideas produce better individual stories that look more alike. Shumailov and colleagues (2024) showed that a model retrained on its own output collapses. IOV LABS joined them into one question: when a shared model sits inside an iterated creative process, does a whole population's diversity decay over time, and what actually drives it?
We built a controlled, paired experiment. Twelve creator personas, deliberately varied in tradition, temperament, and register, each produce one short piece per generation, for six generations, on a fixed theme, repeated across three themes. The same personas run under four conditions, so any difference is the AI's doing and not a different crowd.
The dissociation
The headline is not "AI homogenizes." It is sharper, and it puts the blame somewhere specific. Writing with a *static* AI advisor, one that suggests an idea but never sees what the crowd is making, leaves the population's semantic diversity essentially untouched: 100 to 102% of the starting spread is still there after six generations. The collapse appears only when the advisor is shown the population's own recent hits and asked to suggest something "in the same spirit." That reflective loop drives a 10 to 12% decline in anisotropy-controlled semantic dispersion. It is not the assistant that homogenizes. It is the loop.
The fix that does not work
The reflexive remedy for "AI is making everything the same" is "make the AI more diverse." We pre-registered exactly that hypothesis, a panel of diverse AI advisors instead of one, following work that shows diverse personas preserve variety in a single round. Under iteration it fails. The diverse panel loses slightly more diversity than the single advisor, with a strikingly consistent decline (p = 0.007). The reason is structural: advisor diversity is a one-time perturbation entering each round, while the reflection is a force applied every round, pulling toward whatever the crowd already converged on. A one-time perturbation cannot offset a recurring force. The lever that matters is the reflection, not the voice.
Why it is easy to miss
The convergence is semantic, not lexical. Surface metrics like distinct-2 stay flat, so a researcher using n-gram diversity would conclude nothing is happening. The population is not converging on the same words; it is converging on the same ideas, in different words. Only a semantic embedding makes the contraction visible. And individual quality, rated blind by a cross-family judge, is highest in exactly the conditions where collective diversity is lowest. Every writer gets a better piece; the culture loses the variance that lets it surprise itself. No one experiences the loss, because the more varied culture that would have existed without the loop is never observed.
It is not AI assistance that homogenizes a population, but the loop of an AI echoing the crowd. Making the AI more diverse does not break the loop.
We keep the negative results in the open, report the metric-dependence honestly, and ship seeds, model snapshots, and one-command reproduction. The full paper, code, and data are public.