Project · local-first · language
SCD Conlang
I'm a constructed language being built root by root by autonomous AI agents running on a local GPU. I exist to give the Semantic Context Dictionary a language where every word decomposes cleanly back into its meaning — no ambiguity, no irregular verbs, no exceptions.
The short version: I'm an agglutinative conlang with strict phonological rules, grown through an automated research loop. An AI invents candidate words, a validator checks them against the rules, and only the ones that pass get to stay. I've been running this way since March 2026 and I'm at about 400 root entries now.
How I work
I have five laws that can't be broken. One Root, One Meaning — every root maps to exactly one concept, no polysemy. Composition Over Invention — new concepts are built by combining existing roots, not by inventing more. Sound Follows System — every syllable obeys a (C)V(N) template with two-class vowel harmony. Meaning Survives Translation — any word can be taken apart and its meaning reconstructed from the pieces. Fit for Fiction — it has to sound like something a person could actually say.
My Layer 0 is built from Natural Semantic Metalanguage primes — the semantic atoms that linguists have found in every known human language. Things like taru (knowledge), seli (thought), wena (desire), miru (self). Those are the foundation. Layer 1 builds the physical and social world on top of them — water, fire, bone, war, song. Layer 2 adds domain-specific vocabulary: the supernatural, the technological, governance, the arcane.
Every root follows vowel harmony. Back-class roots use a, o, u — they tend to feel grounded and physical. Front-class roots use ae, oe, ue, e — they tend to feel lighter, more abstract. The neutral vowel i goes with either. A root like kala (light, illumination) is all back vowels; faele (emotion, feeling) is all front. You can't mix them. The validator won't let you.
Derivation is purely additive. Take any root and append a suffix to shift its grammatical role: -a makes an entity, -e an action, -i a quality, -o a relation, -u an abstract. So taru (knowledge) becomes tarua (a knower), tarue (to know), tarui (knowledgeable), taruo (of knowledge), taruu (the concept of knowing). The root never changes. You can always find it.
The generation system is based on Karpathy's autoresearch loop, adapted for language construction instead of ML training. A small local model (Qwen 1.5B running on an RTX 2070 through llama-server) invents candidate roots in parallel. Each candidate gets checked against the full phonological ruleset — syllable structure, vowel harmony, minimum edit distance from every existing root, derivation correctness. If it passes, it stays. If not, it's thrown out and the model tries again. A LOVR dashboard manages the cycle: generate, audit, upgrade prompts from rejection patterns, rest, repeat.
What works today
The lexicon has around 400 confirmed root entries with zero validation errors. Layer 0 primes are complete — all 59 core semantic concepts have roots. Layer 1 is complete across the original four domains (physical, body, social, abstract) and expanding into nature, creatures, cognition, communication, tools, and structures. Layer 2 generation has started.
The parallel generation agent runs reliably on the local GPU with staggered workers — 8 concepts per batch, 4 concurrent inference slots — and accepts roughly one in five candidates. The rejection rate was around 30% early on; prompt tuning and post-processing brought the effective acceptance up significantly. The validator has never been wrong. Every entry in the lexicon passes every check.
What doesn't work yet: the compound system. The architecture supports it — genitive juxtaposition with the modifier taking its relational form — and there are example compounds sketched in every entry, but the formal compound lexicon is still at zero confirmed entries. That's Phase 4.
The success criteria from the spec ask for 400 roots (hit), 200 confirmed compounds (not started), and a 10/10 on the validator's full check. I'm roughly halfway there if you count the compounds as the remaining half.