Who's checking what your AI says to customers?
Ethiek & Beleid

Who's checking what your AI says to customers?

· 8 min read

Enterprise chatbots in live customer interactions hallucinate 15 to 27 percent of the time, according to 2026 benchmarks compiled by AI research platforms. The rate climbs much higher in specialized domains: Stanford HAI found that legal AI tools still hallucinate more than 17% of the time on complex queries. For comparison, the best general-purpose model today, measured across broad tasks, produces fabricated facts in 1.8% of cases.

At 1,000 customer interactions per month, even 1.8% means 18 wrong answers. A German court ruled in May that those answers count as your company's own words.

How often does AI output actually go wrong?

More often than most business owners expect. Globally, there are now 1,348 documented court cases in which AI hallucinations played a role, from the Charlotin AI Hallucination Cases Database. US courts alone have seen 915 cases, including lawyers who filed fabricated case law and medical records that contained incorrect diagnoses.

Here's an example that puts the scale in context: KPMG published a report early 2025 on the benefits of AI. Of the 45 cited references in that report, only five turned out to be accurate. Five out of forty-five. The report was pulled. One of the Big Four accounting firms, with compliance specialists in the thousands, published AI output without adequate verification.

The global business cost of AI hallucinations reached an estimated $67.4 billion in 2024, projected to exceed $112 billion in 2025 as adoption accelerates. Employees spend an average of 4.3 hours per week verifying AI outputs, at an estimated annual cost of $14,200 per employee for hallucination verification and mitigation.

The distribution of errors matters here. On everyday tasks, summarizing a meeting or rephrasing a note, AI rarely makes mistakes. But the moment you're dealing with specific facts, names, figures, or regulations, the error rate jumps. Precisely when your customer is counting on the information.

What did the Munich court actually decide?

The Landgericht München (case 26 O 869/26) ruled in May against Google. Google's AI Overviews had wrongly associated two publishers with fraud and dubious practices. The judge drew a distinction that applies to any business: where a search engine lists sources, AI generates independent new statements. Those statements, the court held, are the company's own words.

"The AI creates statements that don't even appear in the search results, and those therefore count as the defendant's own statements."

Landgericht München, case 26 O 869/26, May 28, 2026

Google lost. But the reasoning extends well beyond search engines. Any business running a chatbot, an AI assistant, or automated customer emails sits in the same legal position. What your AI says, you say.

The EU AI Act's transparency obligation under Article 50 reinforces this: from August 2, 2026, businesses across the EU must disclose when content is AI-generated. That deadline is six weeks away.

Which AI tasks actually need oversight?

Not all AI output carries the same risk. The mistake many businesses make is applying the same policy to everything: either all AI output goes out unchecked, or it's collectively distrusted. A more practical model uses three risk tiers.

Low risk: internal output. Meeting summaries, brainstorming sessions, draft texts that get edited before anyone sees them. No oversight needed. The output doesn't reach a customer, and an error has no consequences outside your own team.

Medium risk: customer-facing, no legal weight. Customer service emails, product descriptions, social media posts, newsletters. The output reaches a customer, but an error is inconvenient rather than catastrophic. A weekly spot-check works here: review ten random outputs per week and adjust the AI's instructions when you notice patterns.

High risk: money, law, health. Legal advice, financial calculations, medical information, quotes with price guarantees, complaint handling. AI output that contains errors here can cost you money or create liability. Human approval before anything goes out. Always.

Think of the difference between cruise control and a self-driving car. Cruise control keeps your hands on the wheel and your eyes on the road. Full autonomy means trusting the technology completely. Most businesses need cruise control, not full autonomy.

Five checks before your AI speaks for your business

For every AI output in the medium- or high-risk category, run through these five questions. Two minutes per message prevents the errors that take hours to fix.

1. Are the facts correct?

Scan the AI output for concrete claims: numbers, names, dates, product specifications, prices. Check each fact separately. AI models are exceptionally good at producing text that sounds accurate but isn't. A chatbot telling a customer your return policy is 30 days when it's 14 creates an expectation you'll have to honor, or a legal liability, as the Munich court confirmed.

2. Does it match your own policy?

AI doesn't know your house rules unless you explicitly feed them in. Check whether the output conflicts with your terms, pricing commitments, or brand promises. An AI assistant giving tax advice on behalf of an accounting firm without a disclaimer is a liability risk. Documenting your AI guidelines upfront pins down those boundaries before outputs go live.

3. Would you put it on company letterhead?

The letterhead test is a surprisingly effective quick filter. Read the AI output and picture it on your official stationery, with your name on it. Would you send it to a customer as-is? No? Rewrite it. The law firm Sullivan & Cromwell filed a brief in April containing 28 fabricated citations, all AI-generated. That test would have caught it.

4. Can you trace the source?

If the AI makes a claim ("our product saves an average of 30%"), can you verify where that figure comes from? AI models tend to generate statistics that sound plausible but aren't based on anything. If you can't trace a claim to a real source, cut it.

5. Does the customer know it's from AI?

Under the EU AI Act Article 50, businesses across the EU must disclose when content is AI-generated from August 2, 2026. That covers chatbots, automated emails, and generated product descriptions. Start now: it builds trust, and it's a legal requirement in six weeks.

What does the EU AI Act require?

The EU AI Act carries seven separate compliance deadlines. Article 50's transparency obligation is the first one directly relevant to businesses running customer-facing AI. From August 2, 2026, you must indicate when a customer is communicating with an AI system.

In practice, that can be a label ("This response was generated with AI assistance"), a banner in your chat widget, or a footnote in an automated email. The law doesn't specify how you disclose, only that you do.

The heavier obligations for high-risk AI systems, such as AI used in job screening or credit assessment, follow in December 2027 after the Omnibus delay. But the transparency requirement lands in weeks. The European Data Protection Board has indicated that enforcement will be a priority.

Worth noting: transparency isn't only a legal issue. Research consistently shows that customers who know they're interacting with AI are more forgiving of mistakes than customers who believed they were talking to a human. Being upfront reduces reputational damage down the line.

How do you build a review routine that scales?

The five checks above work for occasional output. If your chatbot handles hundreds of conversations daily, you can't manually review every response. Three principles keep the oversight manageable.

Divide oversight by risk tier. The three-level model above determines how much attention each output gets. Low-risk output passes automatically. Medium-risk output gets spot-checked. High-risk output always involves a human.

Use a second AI model as a reviewer. Have a second model review the first model's output for factual claims and policy conflicts. Anthropic research shows that a model specifically instructed to find errors outperforms the model that produced the original text. The results are consistent, even if the setup seems circular. Retrieval-augmented generation (RAG) reduces hallucination rates by 30 to 70% in summarization tasks.

Run a weekly audit. Pull ten random chatbot responses or generated emails each week and review them manually. You catch errors the system misses, and you build a database of recurring mistakes that lets you sharpen your AI's instructions. After a few weeks, you know exactly where the weak spots are.

Three steps you can take this week

You don't need a full quality system immediately. Three steps that together take an afternoon give you clear visibility into where your risks are.

Step 1: Build an inventory. List every place where AI output reaches a customer. Your website chatbot, automatic replies in your support ticket system, AI-generated product descriptions, automated emails. At many businesses, this list is longer than expected. According to TheAIDaily's AI customer service statistics, AI now handles a majority of first-contact customer interactions at companies that have deployed it, yet formal oversight policies remain rare.

Step 2: Assign each channel a risk level. Use the three tiers: internal is low risk, customer-facing with no legal weight is medium, and anything touching money, law, or health is high. Write it down. Knowing where the risk sits matters as much as knowing what AI actually costs your team, and both are worth tracking formally.

Step 3: Add an approval step for high-risk channels. Have the chatbot escalate sensitive questions to a human. Route automated quotes through a colleague before they go out. It costs fifteen minutes a day and prevents the one mistake that takes weeks to fix.

AI adoption is accelerating across Europe and globally. The businesses that build reliable quality control first won't be seen as the most cautious. They'll be seen as the most trustworthy. In a market where 62% of consumers prefer interacting with a chatbot over waiting on hold, that trust is a competitive advantage that compounds over time. According to data from TheAIDaily's shadow AI research, 44% of businesses already share company data with AI tools, often without central oversight of what goes out. The review routine you put in place now is exactly the gap that closes that risk.

Michael Groeneweg
Written by Michael Groeneweg AI consultant at Digital Impact and founder of UnicornAI.nl

Michael is an AI consultant at Digital Impact in Rotterdam and the founder of UnicornAI.nl, where he builds AI solutions and SaaS integrations for businesses. An entrepreneur for ten years, he has spent the last few refusing to touch anything that doesn't have AI woven into it, at work and at home, to the mild dismay of the people around him. His travels have turned into a running experiment in what AI can and can't do from a cafe terrace in Lisbon or a train station in Tokyo. He obsessively tests new tools, builds solutions for clients, and believes nobody should buy the hype, but nobody can keep pretending AI doesn't change everything either. Loves good coffee, long flights, and people who build with AI instead of just talking about it.

Written by a human, with AI assisting research and editing. More on our method in the AI disclosure.