All papersOpen benchmark · v1.0

Korean Text Rendering in Text-to-Image Models: A Reproducible Character-Error-Rate Benchmark

Authors
Han Kim
Papers
IOV Labs · open benchmark · 16pp · 2026-05-29

Abstract

Benchmarks for text inside generated images are overwhelmingly English, which conceals the writing systems where models actually fail. We measure one directly: nine text-capable text-to-image models each draw fourteen Korean (Hangul) phrases on an identical plain poster, the rendered text is transcribed by a vision-language model (GPT-4o), and scored by character error rate (CER). Three models — recraft-v4-pro, seedream-5, and nano-banana-pro — render every prompt perfectly (CER 0.000, 14/14), and a clear quality gradient follows. At the bottom, imagen-4 cannot write Hangul at all: it produces plausible-looking Korean-shaped gibberish on every prompt (0/14, mean CER 1.33), turning 커피 한 잔 into 소동석 고려아는 아라해안. The central finding is that strong English text rendering does not transfer to Korean, and is invisible to an English-only benchmark. The harness is open, runs with one command, resumes from saved results, and is trivially extensible to new prompts and models.

Keywords

Download PDF