The Tells: A Measurable Taxonomy of the AI-Generated Design Look, and a Harness to Escape It

Authors: Han Kim
Papers: IOV Labs · open taxonomy & harness · 19pp · 2026-06-02

Abstract

Interfaces produced by generative models are instantly recognizable: an indigo-to-violet gradient, Inter on white, a hero followed by three emoji feature cards, one border-radius, one soft shadow, a headline that says build the future of work. Practitioners spend large amounts of time and tokens trying to make AI output not look like AI, yet the target is treated as ineffable taste. We argue the opposite: the AI look is a finite, enumerable set of statistical defaults, and is therefore measurable. We contribute (i) a taxonomy of 27 design tells across eight families (color, type, layout, spacing, surface, motion, copy, and AI self-reference), each grounded in the documented mechanism of model convergence and in the published craft of human-crafted interfaces; (ii) a dependency-free static detector that resolves both raw CSS and utility classes and reports a Tell Score in [0,100] (lower is better); and (iii) a harness, a CLI, an MCP server, and a drop-in prompt module, so any team or agent can audit and prevent the look. In a confound-controlled refactor that holds a page's content and structure fixed and changes only the tell-bearing properties, the Tell Score of a canonical AI landing page falls from 77 (grade F) to 0 (grade A); across a six-page corpus the detector separates AI-default from designed pages with no overlap (nearest pair 47 points apart). We close with the epistemics: a discriminator of machine-default is not a judge of beauty, taste is the compression of lived choices that a median cannot hold, and if everyone optimizes the same score we risk a second-order convergence, the same homogenization our companion study finds in iterated creation. Grounded in Refactoring UI, Rams, Nielsen, the premium-UI craft of Stripe/Linear/Vercel, Toss's writing principles, and the Anthropic frontend-aesthetics cookbook. To prove the detector is a discriminator and not a machine that calls everything AI, we render 202 real top-tier sites, learn the empirical distribution of human-crafted design, and recalibrate with a craft-credit model in which real craft offsets cosmetic defaults: the 202 sites then score at a median of 0 (93% grade A) while AI defaults score 35 to 59, and the detector now audits live URLs. A brand purple is not a tell (Stripe uses 123 and scores 0) and Inter is not a tell (Linear ships it with a real type system). Finally, to turn the negative instrument into a positive one, we render 199 of these sites a second time and read the concrete per-component CSS they ship, in both light and dark, yielding a measured spec catalog: primary-button radius splits between a soft-round 8 to 12px cluster and a full pill, the type scale lands near 64/48/32/16px, dark backgrounds are tinted near-blacks rather than pure black, and accent hues are fully dispersed across sites (the hue is never the tell). A field check folds in two production codebases whose maintainers wrote their own avoid-the-AI-look design manifestos: they independently name the same tells and six more, which we add as a new family (AI self-reference, the sparkle icon and the 'AI'/model label and the preview-insert flow) plus the multi-color pill, the micro-type, and the nested box, taking the taxonomy to 27 tells. Code, data, the 202-site corpus, the 199-site spec catalog, figures and harness are open.

Keywords

AI design
generative UI
design taxonomy
AI slop
design tokens
Tailwind
model convergence
design systems
static analysis
reproducibility

Download PDF GitHub (run it)Research note