Why Prompt Engineering Doesn’t Work in Legal & RegTech — and What Surveill Does Instead
Jun 3, 2025

Why Prompt Engineering Doesn’t Work in Legal & RegTech — and What Surveill Does Instead
There’s a popular belief in the AI space that with the right prompt, you can make a large language model (LLM) do just about anything. While that might hold in creative or conversational applications, it falls apart in high-stakes domains like regulatory compliance.
In legal and RegTech contexts, accuracy isn’t optional. And yet, in open-ended prompt-driven systems, hallucination rates can hover around 60%—meaning that more than half the time, the answer is wrong, made up, or misleading.
So why doesn’t prompt engineering work here—and what makes Surveill different?
The Problem with Prompt Engineering in Compliance
Prompt engineering is the practice of crafting precise instructions to get the “right” answer out of a general-purpose AI model. In theory, the more specific the prompt, the better the output.
But in legal and regulatory review, this model breaks down for three reasons:
The law is not a creative task
Unlike image generation or creative writing, legal analysis has a right answer grounded in statute, rule, or precedent. You don’t want “creative” interpretations of FINRA 2210 or the SEC Marketing Rule.LLMs lack memory and context at scale
They can’t easily retain thousands of pages of policy, prior approvals, or enforcement trends. Even with good prompting, the model is prone to lose track of the rules and revert to generic or incorrect assumptions.Prompting can’t enforce compliance logic
You can’t “prompt” your way into consistent application of your firm’s unique risk tolerance, disclosure format, or review standards. Those have to be programmed—not just asked nicely.
Surveill’s Solution: Guardrails, Not Guesswork
Surveill doesn’t rely on prompting alone. Instead, it’s built like a regulated system—with risk checks, validation steps, and control logic at every layer.
Think of it like algorithmic trading: no matter how confident the model is, risk management logic will step in if a trade (or review) breaches boundaries. Surveill works the same way.
We’ve embedded half a dozen independent guardrails into the system to ensure accuracy, consistency, and defensibility, including:
Rule Anchoring
Every review is tied back to a specific rule set (e.g., FINRA 2210, SEC 206(4)-1), not just inferred by the model.Policy Overlay
Each client’s internal policies are layered into the logic, so their interpretation of gray areas is consistently applied.Risk Scoring
High-risk outputs trigger escalations or demand human review—no matter how confident the AI is.Disclosure Validators
Surveill checks not just for the presence of required disclosures but for formatting, prominence, and proximity.Memory of Prior Decisions
If a phrase or format was flagged in one campaign, it will be flagged again—no more selective memory.Audit Trail Enforcement
Every comment, change, and flag is documented in a way that’s regulator- and exam-ready.
Results That Matter
Because of this guardrail-first architecture, Surveill achieves over 90% consistency and accuracy in its reviews—far beyond what’s possible with standalone prompting. The output isn’t just helpful—it’s reliable, repeatable, and safe to build workflows around.
Final Thought
Prompt engineering may be good enough for brainstorming or answering trivia, but in compliance, it’s a liability. In regulated industries, what firms need isn’t clever prompting—they need control, transparency, and trust.
That’s what Surveill delivers.