Most benchmarks for text inside generated images are written in English, which hides the writing systems where models actually break. IOV LABS measured one of them directly. Nine text-capable image models were each asked to draw 14 Korean phrases on a plain poster, and the rendered text was read back by GPT-4o and scored by character error rate. The result is a clear ranking, and one blunt failure.

What was measured
Each model received the same instruction for every prompt: a white poster with the target Hangul as the only text, black sans-serif lettering. The phrases ranged from a simple greeting to hard cases such as the clusters in 값을 and 맑음, the tense consonants in 떡볶이, a full sentence, place names, and digits mixed with Korean. GPT-4o transcribed whatever each model drew, and character error rate scored it against the target, with whitespace ignored. Zero means a perfect render.
The result
Three models rendered every prompt perfectly: recraft-v4-pro, seedream-5 and nano-banana-pro, all at zero error across 14 prompts. gpt-image-2 and recraft-v4 followed closely. The cheaper and older models slipped on the harder strings, with ideogram-v3 managing only 5 of 14.

The blunt failure
The standout is imagen-4, which scored 0 of 14 with a mean error above one. It did not merely make typos, it produced plausible-looking Korean-shaped gibberish: 커피 한 잔 came back as 소동석 고려아는 아라해안, and 맑음 as 옹반재다. A model can be excellent at English text and still be unable to write Hangul at all, which is exactly the gap an English-only benchmark would never surface.

Strong English text rendering does not transfer to Korean. You only see it if you measure it.
Open and reproducible
The benchmark is public and runs with one command given a fal.ai and an OpenAI key. The harness resumes from saved results, so re-running only retries failed cells, and the prompt and model lists are easy to extend. It is the first concrete experiment from the lab's research note on verifiable quality for generative media, where Korean text rendering was flagged as an open field.
