The Whitelist Approach to LLM-Native DSLs
or: How to stop worrying and trust AI-generated specifications
The Whitelist Approach to LLM-Native DSLs
or: How to stop worrying and trust AI-generated specifications
Picture this: You’re the security guard at an exclusive event. Your job is to keep troublemakers out.
You have two options.
Option A: Let everyone in, then patrol constantly watching for problems. Someone might start a fight. Someone might steal. Someone might break things. You’re reactive, exhausted, and never quite sure you’ve caught everything.
Option B: Check everyone at the door against a guest list. Once inside, the venue only has things guests are supposed to access anyway. You can relax. Most problems simply can’t happen.
This is the difference between blacklist and whitelist security models. And it’s the core of why most teams are solving LLM safety the wrong way.
The Question Everyone Gets Wrong
When engineering teams start working with LLMs, they inevitably ask: “How do we stop the AI from doing dangerous things?”
That’s a blacklist question. It assumes the AI has access to everything, and your job is to block the bad stuff. Scan for dangerous patterns. Blacklist risky operations. Build sandboxes. Parse ASTs. Add rate limits.
Here’s what years of building data integration systems taught me: blacklisting doesn’t scale.
Code is infinitely expressive. There’s always another way to do something dangerous. Another edge case. Another hallucinated syntax error. Another clever exploit. You’re playing defense against an opponent with infinite creativity.
The better question is: “What’s the minimum set of safe things the AI needs to do its job?”
That’s a whitelist question. And it changes everything.
The Power of Constraints
Here’s a insight that feels counterintuitive at first: Most business problems don’t need infinite expressiveness.
Think about data mapping. You’re transforming hotel reservation data from format A to format B. Do you need file system access? No. Network calls? No. The ability to spawn processes? Definitely not.
You need to:
- Look up values
- Transform arrays
- Handle missing data
- Do some basic math
- Maybe format a date
That’s it. Maybe a dozen operations, total.
The same is true for most domain-specific tasks. Workflow orchestration doesn’t need arbitrary code execution. Business rules don’t need reflection. API composition doesn’t need eval().
The insight: When you remove everything that’s not essential, what remains isn’t weaker. It’s safer, more reliable, and paradoxically more powerful—because both humans and AI can reason about it clearly.
This is the foundation of whitelist thinking. Instead of starting with a Turing-complete language and trying to block dangerous parts, start with nothing and add only what you need.
The result? A language that’s structurally incapable of being dangerous.
Five Principles for LLM-Native DSLs
Working with hotel property management systems—where we need to transform wildly different data formats into a unified schema—revealed patterns that work consistently well with both LLMs and humans.
1. Structured Over Strings
String-based code is a minefield. One missing quote. One unclosed bracket. One syntax error and everything breaks.
LLMs have seen billions of JSON examples. They rarely mess up the structure. But ask them to generate template strings with nested quotes and escaped characters? Error rates skyrocket.
The principle: Build your DSL using JSON, YAML, or S-expressions—formats where syntax errors are nearly impossible. Let the format handle the syntax rules automatically.
This isn’t just about LLMs. Structured formats are easier to validate, diff in git, and store in databases. When a product manager can look at a specification and actually understand it, you’ve designed something that works for everyone.
Why it matters: Syntax errors drop from ~50% to near zero. Validation becomes automatic. Debugging becomes trivial—just trace the tree.
2. Composition Over Configuration
The UNIX philosophy: small tools that do one thing well and combine infinitely.
Most DSLs start with noble intentions—a handful of primitives—then slowly accumulate special cases. “We need to sum arrays… oh, and average them… oh, and median… oh, and weighted averages…”
Before you know it, you have 200 operations and the LLM can’t remember which one to use when.
The principle: Keep your primitive set ruthlessly small (10-20 operations). Make them compose naturally. Complex behavior emerges from combination, not enumeration.
An LLM that knows 15 operations can combine them in thousands of ways. An LLM that knows 200 operations will pick the wrong one half the time.
Why it matters: LLMs understand composition because that’s how they build responses—combining simpler components into complex outputs. Work with their strengths.
3. Validation-First Architecture
Imagine a world where invalid specifications simply cannot execute. Not “execute and then error,” but “checked at the door before anything runs.”
This is your security boundary. Validate the entire specification structurally before executing a single operation. Use schemas to make validation automatic and exhaustive.
The principle: Invalid specifications never reach the evaluator. When validation succeeds, you have a guarantee: this spec is structurally sound. It might compute the wrong answer, but it won’t crash, leak data, or corrupt state.
Why it matters: Validation errors become your feedback loop for improving LLM outputs. Security becomes a compile-time property, not a runtime concern.
4. No Hidden State
Hidden state is the silent killer of debuggability. When a value can appear “from anywhere”—global variables, imported modules, environment settings—tracing problems becomes detective work.
LLMs are terrible detectives. They don’t have your context. They can’t see your environment variables.
The principle: Make all data flow explicit. Every value should be traceable to its source by reading the specification. If you need variables, make them lexically scoped and visibly declared.
Why it matters: Specifications become self-documenting. A product manager can understand what data goes where without knowing anything about your runtime environment. An LLM can reason about data flow without needing context.
5. Monotonic Complexity
Ever asked an LLM to add a simple feature to a specification, and it rewrote the entire thing from scratch? Often wrong. Always frustrating.
This happens when adding complexity requires restructuring. When error handling means wrapping everything in try-catch blocks.
The principle: Design your DSL so adding features means adding nodes, not restructuring existing ones. A 10% more complex task should produce a 10% more complex specification.
Why it matters: LLMs excel at local edits but struggle with global restructuring. When your DSL supports monotonic growth, the LLM can build solutions incrementally rather than guessing the full complexity upfront.
Case Study: FluxMap
A weekend project that validated years of conviction
The Problem Space
Hotel property management systems are a beautiful nightmare of data inconsistency.
You have Mews, Guestline, Opera, Cloudbeds—50+ systems, each with completely different schemas for the same conceptual model. One calls it BookRef, another calls it Id, a third calls it ReservationNumber. One stores guest data inline, another references a separate customers table. One uses PreArrival for status, another uses Confirmed.
We needed to transform all of these into a unified schema. Consistently, safely, at scale.
The traditional approaches all had obvious problems:
- Hand-written mappers: 500 lines of TypeScript per integration, each subtly different, impossible to maintain
- Code generation: Works until you need to debug or modify the generated code
- Template strings: Security review nightmares, syntax errors everywhere
The Solution
FluxMap is structured JSON all the way down. Every transformation is a tree of composable operations.
The language has about 18 core operations: look up values, traverse paths safely, transform arrays, handle conditionals, do basic math, format dates.
That’s it.
No file I/O. No network calls. No arbitrary function execution. No escape hatches.
The entire language is structurally incapable of doing anything beyond reading the input JSON and computing the output JSON.
This isn’t a sandbox with walls you might escape. It’s a room with only one door and no windows.
Each operation is pure—same inputs always produce same outputs—and explicitly declares its data sources. Want to reference a field? Use a $ref operation. Want to transform an array? Use $map. Want a fallback if a value is missing? Use $coalesce.
The five principles in action:
- Structured: Pure JSON, syntax errors impossible
- Composable: 18 primitives combine into thousands of patterns
- Validated: Schema checks before anything executes
- Explicit: Every value traces to its source
- Additive: Adding complexity doesn’t require restructuring
Security by Design
Here’s what’s remarkable: FluxMap specifications don’t require security review.
Not because we have great sandboxing (we don’t have any). Not because we scan for dangerous patterns (we don’t do that either). But because the language itself can’t express dangerous operations.
Compare this to code generation, where you need:
- Runtime sandboxing
- Resource limits
- Import whitelisting
- AST scanning for dangerous patterns
- Memory and CPU quotas
- Probably a lawyer
And even then, you’re never quite sure you caught everything. There’s always one more edge case, one more clever exploit.
With FluxMap? Validate the JSON structure, execute the spec. Done. If it validates, it’s safe.
The worst an attacker—or a confused LLM—could do is compute an incorrect answer. They can’t read files, make network calls, spawn processes, consume infinite memory, or leak data. Those operations simply don’t exist in the language.
Real-World Complexity
FluxMap configs can handle genuinely complex transformations:
- Cross-referencing data across multiple arrays
- Nested lookups with scoped variables
- Enum mapping from PMS-specific values
- Aggregations and calculations
- Conditional logic
- Date formatting
- Null safety with fallbacks
A Mews PMS mapping, for example, needs to build an index of customers by ID, extract reservations, cross-reference to find bookers, map status enums, aggregate payment amounts, and handle missing data gracefully.
In TypeScript, this would be 200+ lines of imperative code with loops, error handling, and manual null checks.
In FluxMap? A declarative tree that composes the 18 primitives.
More importantly: A product manager can review it and understand what data is being transformed. Try that with generated code.
The entire spec is readable by non-engineers and generatable by LLMs. That combination—human-readable AND machine-generatable—is the sweet spot.
Beyond Data Mapping
The same principles apply everywhere you need LLM-generated specifications.
Workflow Orchestration
LLMs can generate workflow specs reliably, and the worst they can do is create an inefficient workflow. They can’t leak credentials, spawn processes, or DOS your infrastructure.
Business Rules
Product managers can write (and review) pricing logic. LLMs can generate it from natural language. Finance can audit it. Everyone understands it because the operations match business concepts, not programming constructs.
API Compositions
You’ve just made it safe to let LLMs orchestrate your API calls without worrying about them accessing internal endpoints or leaking data through URL parameters.
Form Validation
Dynamic form validation that’s safe to store in a database and evaluate in the browser.
How to Build Your Own
If you’re considering an LLM-native DSL for your domain, here’s the thought process:
Start With Constraints
Don’t ask “what should my language do?” Ask “what should my language prevent?”
List every dangerous thing that could happen in your domain. File access? Network calls? Infinite loops? Memory exhaustion? Database writes? Those are your forbidden operations.
Design a language where they simply don’t exist.
Find Your Primitives
What are the 10-15 atomic operations your domain actually needs? Not “nice to have.” Actually needs.
For data mapping: lookup, traverse, transform, aggregate, conditional. That’s basically it. Everything else is sugar.
For workflows: trigger, wait, check, branch, call.
For business rules: compare, calculate, lookup, decide.
Keep this set ruthlessly small. Resist the urge to add convenience operations. Composition will handle complexity better than enumeration.
Choose Structure Over Syntax
Pick a format where syntax errors are nearly impossible. JSON if you’re building tree-like specifications. YAML if configuration is central. S-expressions if you want maximum composability.
Avoid anything that requires parsing or has ambiguous grammar. The format should handle syntax rules automatically.
Validate Everything Upfront
Build a schema. Make it exhaustive. Validate the entire spec before executing anything.
This isn’t just about safety—it’s about feedback. When an LLM generates an invalid spec, the validation errors teach it how to fix it. This is your iteration loop.
Start With an Interpreter
Don’t optimize yet. Build a reference interpreter that’s obviously correct. Make it deterministic. Log everything. Keep all state explicit.
You can compile, optimize, and JIT later. But your interpreter is your source of truth—the spec for what “correct” means.
The Specs-as-Data Advantage
Here’s an unexpected benefit: when your DSL is structured data rather than code, you can treat specifications like any other data in your system.
FluxMap specs can be:
- Stored in a database alongside other records
- Versioned in git with meaningful diffs
- Cached and distributed across services
- Analyzed statically for optimization hints
- Fingerprinted for change detection
- Compiled to intermediate representations
You can build a compiler that desugars convenience operations, interns repeated strings, analyzes static paths, and generates optimized evaluation plans—all while keeping the source spec readable and portable.
This isn’t possible with code generation. Generated code is an artifact you need to manage, version, deploy. DSL specs are data you can query, transform, and distribute like any other data in your system.
What We Learned
Small is Beautiful
We started with 10 core operations. Resisted adding more. Discovered that 18 primitives could handle everything we needed, with sugar operations providing convenience without bloating the core.
Every time we wanted to add a new operation, we asked: “Can this be composed from existing operations?” Usually yes. The few times we added something, it was genuinely primitive.
Consistency Beats Cleverness
Every operation follows the same pattern. No special syntax for special cases. No shortcuts that save three characters but break the mental model.
LLMs love consistency. So do humans. When every operation is structured the same way, you learn the pattern once and apply it everywhere.
Humans Collaborate With AI Better Than We Expected
Engineers don’t write FluxMap specs from scratch. They ask the LLM to generate one, then tweak it. This is faster than writing it manually and more reliable than fully automated generation.
The DSL became the collaboration surface between human expertise and AI capability. The LLM gets you 90% there, human judgment handles the nuanced edge cases.
Non-Engineers Can Participate
Product managers review FluxMap specs before they go to production. They understand what’s being transformed, can spot logic errors, can suggest improvements.
This never happened with TypeScript mappers. Code is a barrier. Declarative specs are documentation.
The Bigger Picture
For the past decade, we’ve been building better code generators: ORMs that generate queries, GraphQL that generates types, OpenAPI that generates clients. Each one optimizes for expressiveness—making it easier to write more powerful code faster.
LLMs felt like the natural next step: natural language → code. The ultimate generator.
But this optimizes for the wrong thing.
The real bottleneck isn’t “can we generate code?” It’s “can we trust what was generated?”
Every line of generated code is a potential bug, security issue, or maintenance burden. More expressiveness means more surface area for problems.
The insight: Stop trying to generate safe code. Design languages where unsafe code is impossible to write.
This isn’t about limiting power. It’s about channeling it. Most business problems don’t need Turing-complete languages. They need the right set of operations, composed well, constrained appropriately.
Data mapping doesn’t need file I/O. Workflows don’t need arbitrary code execution. Business rules don’t need reflection.
By removing what’s unnecessary, you make what remains safer, more reliable, and paradoxically more powerful—because humans and AI can both reason about it clearly.
What This Means for You
Look at your system. Where are you generating code? Where are you using template strings? Where do you have humans writing repetitive logic that differs only in data values?
Those are opportunities for LLM-native DSLs.
Ask yourself:
- What operations does this domain actually need?
- What operations are actively dangerous?
- Could I design a language where dangerous operations don’t exist?
- Would LLMs be able to generate specifications reliably?
- Would non-engineers be able to review them?
If the answers point toward a constrained DSL, you’ve found an opportunity. The hardest part isn’t building it—FluxMap took a weekend—the hardest part is committing to the constraint. Resisting the urge to add “just one more feature” that breaks the safety model.
The Path Forward
The future of AI-integrated engineering isn’t teaching machines to write better code. It’s designing better abstraction layers that both humans and machines can work with reliably.
FluxMap proves this for data mapping. Your domain likely has similar opportunities. Workflow orchestration, business rules, validation logic, API compositions, transformation pipelines—anywhere you have logic that’s repetitive but varies in specifics.
The question isn’t whether LLMs can generate safe specifications.
It’s whether you’ve designed specifications that are safe to generate.
The answer starts with a language where danger doesn’t exist. Everything else follows from that.
We’re currently testing FluxMap in production with our hotel integrations platform. The repository is at github.com/ChatlynCom/rnd.project-iris (internal).
The code is less interesting than the approach. Take the approach, apply it anywhere you need reliable AI-generated specifications, and build something that’s safe by design rather than secured by effort.
Written from a skunkworks division where we get to build the approaches others might think are too radical—until we validate them in production.