M365 Show -  Microsoft 365 Digital Workplace Daily
M365 Show with Mirko Peters - Microsoft 365 Digital Workplace Daily
Your Fabric Data Model Is Lying To Copilot
0:00
-23:38

Your Fabric Data Model Is Lying To Copilot

Opening: The AI That Hallucinates Because You Taught It To

Copilot isn’t confused. It’s obedient. That cheerful paragraph it just wrote about your company’s nonexistent “stellar Q4 surge”? That wasn’t a glitch—it’s gospel according to your own badly wired data.

This is the “garbage in, confident out” effect—Microsoft Fabric’s polite way of saying, you trained your liar yourself. Copilot will happily hallucinate patterns because your tables whispered sweet inconsistencies into its prompt context.

Here’s what’s happening: you’ve got duplicate joins, missing semantics, and half-baked Medallion layers masquerading as truth. Then you call Copilot and ask for insights. It doesn’t reason; it rearranges. Fabric feeds it malformed metadata, and Copilot returns a lucid dream dressed as analysis.

Today I’ll show you why that happens, where your data model betrayed you, and how to rebuild it so Copilot stops inventing stories. By the end, you’ll have AI that’s accurate, explainable, and, at long last, trustworthy.

Section 1: The Illusion of Intelligence — Why Copilot Lies

People expect Copilot to know things. It doesn’t. It pattern‑matches from your metadata, context, and the brittle sense of “relationships” you’ve defined inside Fabric. You think you’re talking to intelligence; you’re actually talking to reflection. Give it ambiguity, and it mirrors that ambiguity straight back, only shinier.

Here’s the real problem. Most Fabric implementations treat schema design as an afterthought—fact tables joined on the wrong key, measures written inconsistently, descriptions missing entirely. Copilot reads this chaos like a child reading an unpunctuated sentence: it just guesses where the meaning should go. The result sounds coherent but may be critically wrong.

Say your Gold layer contains “Revenue” from one source and “Total Sales” from another, both unstandardized. Copilot sees similar column names and, in its infinite politeness, fuses them. You ask, “What was revenue last quarter?” It merges measures with mismatched granularity, produces an average across incompatible scales, and presents it to you with full confidence. The chart looks professional; the math is fiction.

The illusion comes from tone. Natural language feels like understanding, but Copilot’s natural responses only mask statistical mimicry. When you ask a question, the model doesn’t validate facts; it retrieves patterns—probable joins, plausible columns, digestible text. Without strict data lineage or semantic governance, it invents what it can’t infer. It is, in effect, your schema with stage presence.

Fabric compounds this illusion. Because data agents in Fabric pass context through metadata, any gaps in relationships—missing foreign keys, untagged dimensions, or ambiguous measure names—are treated as optional hints rather than mandates. The model fills those voids through pattern completion, not logic. You meant “join sales by region and date”? It might read “join sales to anything that smells geographic.” And the SQL it generates obligingly cooperates with that nonsense.

Users fall for it because the interface democratizes request syntax. You type a sentence. It returns a visual. You assume comprehension, but the model operates in statistical fog. The fewer constraints you define, the friendlier its lies become.

The key mental shift is this: Copilot is not an oracle. It has no epistemology, no concept of truth, only mirrors built from your metadata. It converts your data model into a linguistic probability space. Every structural flaw becomes a semantic hallucination. Where your schema is inconsistent, the AI hallucinates consistency that does not exist.

And the tragedy is predictable: executives make decisions based on fiction that feels validated because it came from Microsoft Fabric. If your Gold layer wobbles under inconsistent transformations, Copilot amplifies that wobble into confident storytelling. The model’s eloquence disguises your pipeline’s rot.

Think of Copilot as a reflection engine. Its intelligence begins and ends with the quality of your schema. If your joins are crooked, your lineage broken, or your semantics unclear, it reflects uncertainty as certainty. That’s why the cure begins not with prompt engineering but with architectural hygiene.

So if Copilot’s only as truthful as your architecture, let’s dissect where the rot begins.

Section 2: The Medallion Myth — When Bronze Pollutes Gold

Every data engineer recites the Medallion Architecture like scripture: Bronze, Silver, Gold. Raw, refined, reliable. In theory, it’s a pilgrimage from chaos to clarity—each layer scrubbing ambiguity until the data earns its halo of truth. In practice? Most people build a theme park slide where raw inconsistency takes an express ride from Bronze straight into Gold with nothing cleaned in between.

Let’s start at the bottom. Bronze is your landing zone—parquet files, CSVs, IoT ingestion, the fossil record of your organization. It’s not supposed to be pretty, just fully captured. Yet people forget: Bronze is a quarantine, not an active ingredient. When that raw muck “seeps upward”—through lazy shortcuts, direct queries, or missing transformation logic—you’re giving Copilot untreated noise as context. Yes, it will hallucinate. It has good reason: you handed it a dream journal and asked for an audit.

Silver is meant to refine that sludge. This is where duplicates die, schemas align, data types match, and universal keys finally agree on what a “customer” is. But look through most Fabric setups, and Silver is a half-hearted apology—quick joins, brittle lookups, undocumented conversions. The excuse is always the same: “We’ll fix it in Gold.” That’s equivalent to fixing grammar by publishing the dictionary late.

By the time you hit Gold, the illusion of trust sets in. Everything in Gold looks analytical—clean tables, business-friendly names, dashboards glowing with confidence. But underneath, you’ve stacked mismatched conversions, unsynchronized timestamps, and ID collisions traced all the way back to Bronze. Fabric’s metadata traces those relationships automatically, and guess which relationships Copilot relies on when interpreting natural language? All of them. So when lineage lies, the model inherits deceit.

Here’s a real-world scenario. You have transactional data from two booking systems. Both feed into Bronze with slightly different key formats: one uses a numeric trip ID, another mixes letters. In Silver, someone merged them through an inner join on truncated substrings to “standardize.” Technically, you have unified data; semantically, you’ve just created phantom matches. Now Copilot confidently computes “average trip revenue,” which includes transactions from entirely different contexts. It’s precise nonsense: accurate syntax, fabricated semantics.

This is the Medallion Myth—the idea that having layers automatically delivers purity. Layers are only as truthful as the discipline within them. Bronze should expose raw entropy. Silver must enforce decontamination. Gold has to represent certified business logic—no manual overrides, no “temporary fixes.” Break that chain, and you replace refinement with recursive pollution.

Copilot, of course, knows none of this. It takes whatever the Fabric model proclaims as lineage and assumes causality. If a column in Gold references a hybrid of three inconsistent sources, the AI sees a single concept. Ask, “Why did sales spike in March?” It cheerfully generates SQL that aggregates across every record labeled “March,” across regions, currencies, time zones—because you never told Silver to enforce those boundaries. The AI isn’t lying; it’s translating your collective negligence into fluent fiction.

This is why data provenance isn’t optional metadata—it’s Copilot’s GPS. Each transformation, each join, each measure definition is a breadcrumb trail leading back to your source-of-truth. Fabric tracks lineage visually, but lineage without validation is like a map drawn in pencil. The AI reads those fuzzy lines as gospel.

So, enforce validation. Between Bronze and Silver, run automated schema tests—do IDs align, are nulls handled, are types consistent? Between Silver and Gold, deploy join audits: verify one-to-one expectations, monitor aggregation drift, and check column-level lineage continuity. These aren’t bureaucratic rituals; they are survival tools for AI accuracy. When Copilot’s query runs through layers you’ve verified, it inherits discipline instead of disorder.

The irony is delicious. You wanted Copilot to automate analysis, yet the foundation it depends on still requires old-fashioned hygiene. Garbage in, confident out. Until you treat architecture as moral philosophy—refinement as obligation, not suggestion—you’ll never have truthful AI.

Even with pristine layers, Copilot can still stumble, because knowing what data exists doesn’t mean knowing what it means. A perfect pipeline can feed a semantically empty model. Which brings us to the missing translator between numbers and meaning—the semantic layer, the brain your data forgot to build.

Section 3: The Missing Brain — Semantic Layers and Context Deficit

This is where most Fabric implementations lose their minds—literally. The semantic layer is the brain of your data model, but many organizations treat it like decorative trim. They think if tables exist, meaning follows automatically. Wrong. Tables are memory; semantics are comprehension. Without that layer, Copilot is reading numbers like a tourist reading street signs in another language—phonetically, confidently, and utterly without context.

Let’s define it properly. The semantic model in Fabric tells Copilot what your data means, not just what it’s called. It’s the dictionary that translates column labels into business logic. “Revenue” becomes “the sum of sales excluding refunds.” “Customer” becomes “unique buyer ID over fiscal year boundaries.” Without it, Copilot treats these words as merely coincidental tokens. You might ask for “total sales” and get gross receipts one day, net revenue the next. It’s not fickle; it’s linguistically adrift.

The truth? A semantic model acts like a bilingual interpreter between human language and DAX or SQL logic. It aligns intent with computation. When it’s missing, Copilot must guess. And large language models are notoriously proud guessers. They assemble probable joins, not accurate ones, and they trust column names far more than your business definitions. In one model, “Bookings” refers to reservations. In another, it counts completed trips. To Copilot, those are synonyms. You just asked for “bookings by quarter,” and it served you both—efficiently wrong.

Compare two architectures. In a bare Data Warehouse, Copilot peers directly into tables. It sees “Date,” “Amount,” “Type.” Nice nouns, zero context. It then pattern-matches: “sales” equals “amount,” “by region” equals whatever field smells like geography. In a Semantic Model, however, those fields have been civilized. “SalesAmount” includes a DAX measure defining scope, currency, and aggregation logic. “Region” has relationships pinned explicitly. So when you ask that same question, Copilot doesn’t grope in the dark; it walks a lit path you paved.

Fabric’s 2025 toolbox even gives you weapons for this: DirectLake for low-latency modeling, DAX for measure logic, synonyms for linguistic flexibility, and calculation groups for consistency. Think of synonyms as elastic vocabulary—if users say “net income,” but your measure is “profit,” you don’t rename the column; you teach Copilot translation. Calculation groups then enforce grammar—ensuring time intelligence or currency conversions apply uniformly. Together, they form the language model’s grounding system.

But very few people bother to wire these properly. They dump tables into Fabric and assume Copilot’s “AI” will interpret nuance. Spoiler: it will not. The model doesn’t infer your KPIs from moral conviction; it infers them from metadata tags. If you haven’t labeled measures, described relationships, or specified calculation order, Copilot improvises. Improvisation is charming at jazz clubs, disastrous in enterprise analytics.

So how do you enforce semantics? Start by auditing your model vocabulary. Every measure needs a consistent definition—stored centrally, referenced universally. Resist the urge to rename columns per report; alias them in the semantic layer instead. Build calculation groups for all common logic—year-to-date totals, rolling averages, conversion rates. Then add synonyms so Copilot’s natural language engine doesn’t invent its own correlations.

Now, connect semantics with Copilot’s instruction architecture. Fabric offers two levels of control: data source instructions and agent instructions. Data source instructions clarify schema specifics—joins, measure references, preferred names—while agent instructions govern tone, format, and constraint. These interact through grounding prompts that depend entirely on semantic alignment. If your semantic model defines “Total Sales” clearly, data source instructions can enforce that definition when Copilot writes SQL. The agent instruction ensures the response stays concise, tabular, or summarized. Combine them, and you replace chaos with protocol.

Without semantics, every Copilot answer is an improvisation. With semantics, each response becomes evidence. You’re not tuning prompts—you’re teaching logic. Once you grant Copilot a consistent dictionary, its accuracy spikes because uncertainty has nowhere left to hide.

Now that Copilot understands the language, we’ll teach it discipline—the behavioral rules that keep it honest.

Section 4: Discipline for Machines — Data and Agent Instructions

Copilot may finally understand your vocabulary, but it still needs manners. Semantics teach meaning; instructions teach behavior. And in Microsoft Fabric, those behaviors are shaped through two distinct leashes: data source instructions and agent instructions. One structures the dog, the other trains it not to chew the furniture.

Start with data source instructions—the schema-specific commandments. These live closest to the data and tell Copilot exactly how to interpret your model. Remember our earlier chaos with weather joins? Data source instructions are where you fix that. You specify that temperature and rainfall data must join on pickup geography ID and date ID. This removes guessing from the model and replaces it with obligation. Fabric reads those declarations whenever Copilot drafts SQL or DAX, ensuring every query routes through the right keys. Think of them as the relational constitution—immutable legal text defining how entities may connect.

These instructions also handle synonyms and measure definitions. Suppose your analysts use “total sales,” “fare amount,” and “gross receipts” interchangeably. To humans, those all mean roughly the same thing. To Copilot, they’re distinct variables fighting for attention. Under data source instructions, you align them: “Whenever total sales is mentioned, use total_fare_amount.” Now your AI stops playing semantic roulette and begins answering consistently.

They can even define bucketing logic for categorical analyses. When a user asks “sales by temperature,” you can predefine ranges—below 32°F equals “freezing,” up to 53 “cold,” 53 to 72 “mild,” and so on. Without that, Copilot returns infinite numeric values, drowning the user in fine-grained trivia. By embedding these buckets directly in your instructions, you spawn predictable aggregation and reproducible logic.

What you’re really doing here is converting latent metadata into explicit rules. Fabric’s data agent parses these before constructing queries, treating them as immutable ground truth. Missing them is why earlier queries produced warped results—it generated perfectly valid SQL for the wrong question. With correctly written data instructions, Copilot’s queries stay anchored in business rules rather than linguistic probability.

Now let’s move up a layer to agent instructions—the etiquette school for LLM behavior. While data source instructions tell Copilot what the data means, agent instructions dictate how it should respond. They define presentation, style, and analytical scope. You can instruct: “Always summarize with a concise table,” or “When multiple sources exist, clarify lineage before aggregation.” Essentially, you transform Copilot from an overeager intern into a disciplined analyst.

Consider the “top five rows” rule from data‑agent tuning. Instead of returning thousands of records, you tell it: “Render only top five unless user requests otherwise.” The AI enforces brevity, sparing the average manager from scrolling through a data novel. Another useful directive: “Always provide a summary at the top of each response.” This acts like meta‑commentary—Copilot explains what it’s showing, letting users verify that measures, joins, and filters match expectations. It turns invisible reasoning into visible confirmation, crucial for trust.

Notice the interplay: data instructions define the facts, agent instructions define the voice. Together, they form Copilot’s ethical code. The data agent combines these into a composite prompt every time it answers a question. The language model doesn’t freestyle; it conditions itself through your rules.

Now, governance enters the scene. These instructions can’t live in anarchic individual copies. If each Fabric user customizes synonyms to personal taste, you’ve reintroduced the Babel you spent months eliminating. You’ll get five definitions for “profit” and zero agreement during quarterly reviews. Centralize instruction sets—publish them through Fabric’s management workspace, validate them through peer review, and lock them behind version control. Flexibility without governance is rebellion disguised as innovation.

Of course, even these rules decay if you never verify them. The intelligent approach is to treat instruction quality like code quality. After each edit, test Copilot’s behavior. Ask the same question twice after clearing cache—see if it repeats the same SQL. Examine the generated queries; ensure they reflect enforced joins and correct measures. If outputs drift, revisit your instructions. Even Microsoft’s own engineers admit this tuning is “more art than science,” which is flattering shorthand for “constant debugging.”

Here’s where example queries rescue you. They’re Fabric’s answer to “guardrails,” allowing you to hard‑code canonical SQL fragments for common analytical intents. Say you know that “total revenue by temperature bucket” must always perform a dual join on geography and date. You record that as an example query. When users ask any variation of that phrase, Copilot bypasses improvisation and reuses your exact logic. It’s the AI equivalent of issuing pre‑approved forms—filled correctly every time, incapable of creative sabotage.

Over time, you can build a library of these example queries, transforming Copilot from a generalist into a domain expert. Think of it as muscle memory. The AI gradually learns what “correct” feels like through repetition and constraint rather than free‑form genius.

Still, the cleverest instructions can’t redeem architectural sin. If your data pipeline wobbles beneath inconsistent Medallion layers or missing semantics, all these guardrails amount to stern speeches delivered on a sinking ship. Instructions constrain language; they can’t invent truth. That’s why tuning is a closed loop—architecture defines reliability, semantics define meaning, and instructions define obedience.

The pragmatic workflow looks like this: build clean Medallion layers, define a precise semantic model, encode those definitions into data source instructions, temper Copilot’s personality through agent instructions, then test relentlessly. Every anomaly you detect early spares you an executive’s angry email later.

So yes—discipline matters. It’s not glamorous. It’s not “AI magic.” It’s rule enforcement, cache clearing, consistency testing, and the occasional self‑esteem check when your AI still insists that total sales equal gross fare plus refunds plus miracles. But once the discipline sticks, Copilot becomes eerily reliable—its hallucinations drop, its joins behave, and your dashboards stop gaslighting you.

You can fix the rules inside Fabric, but Copilot’s honesty still depends on architecture beyond Fabric—the systemic governance that polices truth across your entire data estate. And that’s exactly where we’re heading next.

Section 5: The Systemic Cure — Building for Verifiable Truth

Let’s zoom out. After all the tuning, synonyms, and AI obedience training, there’s still one brutal fact: Copilot’s truth depends on your ecosystem’s integrity. You can train it, scold it, and reward it with clear joins, but if the architecture surrounding Fabric drips data rot, the AI will still politely lie. What you need is a systemic cure—governance, lineage enforcement, and architectural discipline that make lying physically impossible.

At the architectural level, Fabric succeeds only when everything above and below it shares the same moral compass of data hygiene. That begins with schema enforcement. Every dataset in your lakehouse and warehouse needs defined keys, datatypes, and relationships—immutable contracts describing how entities interlock. A drift in one table shouldn’t quietly ripple upward to your Gold layer. Instead, schema enforcement catches it like a code linter, halting ingestion until definitions align. In practice, this doesn’t slow development; it prevents silent corruption before it metastasizes into Copilot’s answers.

Next, introduce semantic versioning. Treat your data models like software. Each time you modify a definition—rename a column, change a measure formula, or adjust a relationship—increment a version. Fabric doesn’t automatically recognize the philosophical impact of “revenue = gross minus discounts” becoming “revenue = net after tax.” Humans must. Versioning preserves lineage history, enabling you to trace which Copilot answers relied on outdated semantics. Without it, you’ll never audit when a lie entered circulation.

Now, tie enforcement to documentation. Microsoft Purview or the OneLake data catalog isn’t wallpaper—it’s the institutional memory that prevents schema amnesia. Certified datasets in these catalogs act as the only permissible sources for Copilot and Power BI. Think of them as notarized entities: legally binding representations of truth. If a dataset isn’t certified, Copilot shouldn’t touch it. By integrating Purview’s metadata with Fabric’s data agents, you ensure that every AI‑driven query cites verifiable provenance. When someone asks, “Where did this number come from?” you can actually answer.

For context continuity across systems, enable Mirroring and Digital Twin modeling. Mirroring replicates external sources—like SQL Server, Cosmos DB, or PostgreSQL—into Fabric in real time, preserving schema and freshness. It’s Fabric’s immune system against data latency and mismatch. Digital Twin modeling, newly introduced, takes this further: it constructs a live digital counterpart of physical or business environments. Copilot can then query these twins, understanding relationships such as “factory output affects transport delays.” This keeps its reasoning grounded in synchronized reality, not stale extracts.

But even the best architecture drifts over time. That’s why you conduct periodic truth audits—ritual checkups where AI outputs meet source‑of‑truth validation. You pick representative queries, rerun them manually or through stored SQL, and compare results. Any divergence signals architectural entropy: corrupted joins, drifted measures, or expired lineage. Think of it as a polygraph test for Copilot. Done quarterly, it lets you recalibrate instruction sets and semantics before mistrust escalates into boardroom embarrassment.

Critically, all layers of your analytics stack—BI, Copilot, Fabric workspace—must draw from a single semantic source. Shadow datasets are the enemy. Every time an ambitious department clones a model to “move faster,” it creates another dialect of truth. Suddenly, Power BI, Copilot, and Excel Copilot are speaking three slightly different languages about the same revenue number. The cure is enforced unity. Centralize semantics in one model, expose it read‑only to downstream consumers, and lock policy compliance to dataset certification. If Johnny in Sales wants a custom metric, he negotiates through governance, not side channels.

Finally, close the learning loop. Governance isn’t static paperwork—it’s an ongoing dialogue between human analysts and AI agents. Humans tune, Copilot executes, humans verify, and Copilot improves through updated context. Each iteration teaches the system what truthful execution looks like. Over time, this feedback becomes invisible infrastructure—the scaffolding of accuracy.

When you achieve this maturity, something extraordinary happens. Copilot stops feeling like an unpredictable oracle and starts behaving like a junior analyst—fast, consistent, deferential to policy. The hallucinations evaporate, not because AI suddenly became honest, but because dishonesty no longer fits the structure. Truth became the path of least resistance.

So the systemic cure is not glamour; it’s governance. Schema enforcement keeps the bones straight, semantic versioning remembers history, Purview certification guards integrity, and single‑source semantics preserve coherence. Mirroring ties reality to reflection; truth audits keep both synchronized. Between them, Copilot doesn’t need to invent—it simply reports.

Once you’ve enforced truth structurally, you’ve earned the right to automate insight. And that brings us to the final lesson: teaching machines to tell the truth by design, not persuasion.

Conclusion: Teaching Copilot to Tell the Truth

Copilot was never untrustworthy—it was obedient. It answered faithfully to whatever chaos you encoded. Once you give it disciplined architecture, consistent semantics, and explicit instruction, it stops hallucinating not out of morality, but inevitability.

Here’s the enduring equation: clean Medallion layers plus semantic discipline plus governed instructions equal verifiable AI truth. Eliminate ambiguity at the source, document meaning precisely, test outputs regularly, and Copilot becomes a mirror polished to reflect only facts.

If your Fabric estate still whispers contradictions, this is your call to audit it—enforce schema contracts, certify datasets, version your semantics, and run quarterly truth audits. Then watch as Copilot transforms from confident improviser to calculable colleague.

Accuracy isn’t magic; it’s maintenance. Build structure so tight that lies physically can’t pass through. Do that, and every future prompt yields not stories, but statements.

If this saved you hours—or potential embarrassment—repay the effort: subscribe to the m365 Show. We specialize in making Microsoft’s AI actually reliable. The next tutorial dives even deeper into Copilot accuracy engineering. Efficiency belongs to the disciplined. Stay tuned.

Discussion about this episode

User's avatar