0x: A Token-Efficient, Verifiable Compilation Target for LLM Code Generation
Abstract
Large language models spend most of their output tokens on framework boilerplate. We present 0x, a compact AI-first language that compiles one source to React, Vue 3, Svelte 5, React Native, Express, and Terraform, and use it to ask two questions a code-generation target must answer. First, efficiency: measured with a real BPE tokenizer across ten apps, 0x source is 2.41× smaller than the React it compiles to (58% fewer tokens; 1.88× vs Vue, 1.80× vs Svelte) — a conservative lower bound. Second, hittability: naively prompted, gpt-4o compiles valid 0x on only 1 of 5 tasks, because it does not know the syntax of a language absent from its training data — familiarity beats compactness. Critically, every failure is a syntax error, not a semantic one. Because syntax is exactly what structure enforcement removes, we constrain generation to a schema-guaranteed AST and render canonical 0x ourselves; combined with real compiler work (desugaring JS spread, normalizing strict equality, two lexer fixes — all 303 tests still passing), first-try compilation rises 1/5 → 5/5, holding at 7/8 on a fresh task set. The compiler-as-verifier, not the prompt, is what makes a compact DSL a viable LLM target. Everything is open source and reproducible with one command.
Keywords
- large language models
- code generation
- domain-specific languages
- token efficiency
- constrained decoding
- structured output
- compilers
- verification
- reproducibility