The Star Schema Trick All Pros Use (But Won’t Say)

M365 Show with Mirko Peters - Microsoft 365 Digital Workplace Daily

0:00

-19:36

The Star Schema Trick All Pros Use (But Won’t Say)

Mirko Peters - M365 Specialist

Sep 26, 2025

Summary

Running The Star Schema Trick All Pros Use means giving your Power BI / data model the structure that performance engines expect — instead of letting it devolve into “digital spaghetti.” In this episode, I explain why flattening everything into a giant table feels easy at first, but drags your reports into slowness and inaccuracy under the hood.

You’ll hear how fact tables and dimension tables each serve distinct roles, how relationships optimize filtering rather than complicate it, and why cleaning your schema is often more powerful than optimizing DAX. By the end, you’ll know how to spot when your model is broken, how to cut it cleanly into facts + dimensions, and how to make your visuals respond instantly rather than crawl.

What You’ll Learn

What “digital spaghetti” is and why it kills model performance
The definition and roles of fact tables vs dimension tables
How to normalize facts and flatten dimensions to align with VertiPaq engine behavior
Why a proper date table is non-negotiable and how to handle role-playing dates
Tactics to clean blanks, flag codes, and cryptic attributes before they become slicers
How to use “junk dimensions” to collect many small flags without bloating the fact
Best practices to make your schema resilient, performant, and predictable

Full Transcript

Your tangled web of tables isn’t a data model—it’s digital spaghetti. No wonder DAX feels like you’re solving a sudoku puzzle after twelve beers. The good news: cleaning it up pays off fast. With the right design, your visuals respond to filters in seconds, your DAX stops fighting you, and your model finally looks like something you’d want to show your boss.

The trick is a star schema. That means one or more fact tables in the center holding your measures and events, surrounded by dimension tables—the who, what, when, and where. Relationships define the roles, and the engine is built to optimize that structure.

You don’t need a PhD in data warehousing; you just need to untangle the chaos into this simple pattern. For more deep dives, hit m365.show—you’ll want it for your next model.

Now, why does your current report crawl like a floppy drive in 1995 the moment you add a filter? Let’s get into that.

The Digital Spaghetti Problem

Welcome to the heart of the problem: the Digital Spaghetti model. You know the type—a giant flat table packed with every column anyone ever thought was useful. Customer names, job titles, phone numbers, sales amounts, discount codes, the works—all jammed together. It feels fine at first because you can throw visuals at it and see numbers appear. But once you stack slicers, cross filters, and extra pages, the whole thing bogs down. That’s not bad luck, and it’s not Fabric throwing a tantrum. It’s the wrong design.

Think of it like a city built without streets. Every building stacked on top of each other in one giant pile. Sure, you can live there if you’re willing to climb over roofs and windows, but try driving across it efficiently—gridlock. A flattened model does the same thing: it clumps facts and context in the same space, so every query has to crawl through duplicate information before getting to the answer.

Microsoft’s own documentation is clear on this point. The Vertipaq engine running Power BI is optimized for one specific design: dimensions store descriptive context such as customers, dates, or regions, and facts store numeric events like sales, clicks, or costs. When you collapse everything into one giant fact-like table, you force the engine to re-do work on every query. SQLBI calls out exactly why this fails: DAX’s auto-exist behavior can produce incorrect results, and missing combinations of data break relationships that should exist but don’t. In practice, this means your report isn’t just sluggish—it can also be misleading.

A large share of real-world performance problems trace back to this exact modeling choice. Not formulas. Not your GPU. Just chaotic schema design. Flattened models force inefficient query patterns: text values get repeated thousands of times, DAX has to de-duplicate attributes over and over, and filter propagation breaks when dimension logic is buried inside fact rows. That’s why your calculations feel heavy—they’re retracing steps the star schema would handle automatically.

Now, here’s a quick 30-second check to see if you’re stuck in Digital Spaghetti:

First: open your fact table. If you see descriptive text like customer names or region values repeated tens of thousands of times, you’ve got spaghetti.

Second: look at your slicers. If 90% of them are built directly from giant fact table columns instead of small lookup tables, that’s spaghetti too.

Third: ask yourself if you’ve got fact columns that double as static attributes—like a “salon group” typed into transaction rows—even when no visits exist. That right there is spaghetti. One “yes” on these checks doesn’t doom your model, but if you hit all three, you’re running in the wrong direction.

The fix doesn’t happen by blaming DAX. The formulas aren’t broken. What’s broken is the road they’re driving on. When attributes live in fact rows, the engine burns time scanning duplicated text for every query. Star schemas solve this by splitting out those attributes into clean, slim dimension tables. One join, one filter, clean result. No detective work required.

This is why experts keep hammering the same advice: expose attributes through dimensions, hide columns in fact tables, and respect the separation between context and numbers. It isn’t academic nitpicking—it’s the design that prevents your report from collapsing in front of the VP. Get the model shape right, and suddenly the engine works with you instead of against you.

Bottom line: what looks like a harmless shortcut—a single huge table—creates a brittle, sluggish model that makes everything harder. Recognizing that the problem is structural is the first real win. Once you see the spaghetti for what it is, draining it becomes the logical next move.

And draining it starts with a sort: deciding what belongs in facts and what belongs in dimensions. That single choice—the first clean cut—is what shifts you from chaos to clarity.

Facts vs Dimensions: The First Sorting Hat

So here’s where the Sorting Hat comes in: deciding what goes into facts and what belongs in dimensions. It might feel like a simple split, but it’s the first real test of whether your model is going to work or implode. Facts are the measurements—how many, how much, how often. Dimensions are the descriptions—the who, what, when, and where. Keep those roles clean, and suddenly filters know exactly where to go instead of trying to squeeze through gridlock.

The general rule is blunt: dimensions describe, facts measure. A fact table is just measurable stuff—transactions, sales amounts, counts of visits. Dimensions hold your lookups: Customers, Products, Dates, Regions. If you jam them all into one table, you get nothing but duplicated values, heavy filtering, and DAX errors that feel like gremlins.

Take relationships: every one-to-many relationship in Power BI tells you who’s who. The “one” side is always the dimension. The “many” side is always the fact. That simple distinction saves you from guessing. Dimensions provide the clean list, facts reference them. If your so-called dimension sits on the “many” end, it’s not a dimension—it’s another fact with identity issues. And if your would-be dimension doesn’t have a unique column? Fine. Build one. Power Query has “Add Index Column.” That’s your surrogate key. No excuses, no drama—just give the engine something unique to latch onto and move on.

What happens if you don’t respect that split? SQLBI has a classic example: a beauty salon dataset. At first, people dumped salon group information straight into the Visits fact table. Looked convenient—until slicing by gender or job title produced missing totals. Why? Because auto-exist logic in DAX couldn’t handle the missing combinations. Key groups didn’t exist in the fact table at all, so filters silently dropped numbers. The fix was obvious once you see it: build real dimension tables for Gender, Job, and Salon. Then adjust the measure to operate on those. Suddenly, the totals matched reality, filters worked, and ghost results disappeared. That’s the power of getting the fact/dimension boundary right.

Another pitfall: stuffing descriptive text straight into your fact table because “it’s already there.” For example, Region repeated half a million times for every transaction. That’s not a lookup—it’s spam. Every time you slice on Region, the model wastes cycles mashing those rows down into a unique list. Instead, throw Region into a dimension table, store each region once, and let the join handle the rest. That’s cleaner, faster, and accurate.

Best practice here is non-negotiable: hide descriptive columns in the fact table and expose attributes only through the dimension tables. You will thank yourself later when your report actually slices cleanly. Slicers should point to dimensions, not bloated fact text fields. Get lazy, and you’ll be back to watching spinners while DAX cries in the background.

If you want a mental image: facts are the receipts, dimensions are the catalog. Receipts don’t carry full product names, addresses, or job titles—they just reference IDs and amounts. The catalog—your dimension—stores the product info once and for all. Mix them up and you’re basically stapling the entire IKEA manual onto every receipt, over and over. That’s what kills performance.

Even Microsoft’s docs repeat this like gospel: dimensions are the single source of truth for lookups. When you follow that, a slicer on Customer or Region works the way you expect—once and cleanly across all related facts. It works not because DAX woke up smarter, but because the schema is finally aligned with how the engine is built to behave.

So the Sorting Hat rule is simple. Facts: your sales, visits, or other measurable events. Dimensions: your customers, products, dates, and regions. Keep them in their lanes. If the “one” side of the relationship can’t stand uniquely, give it a surrogate key. Then hide every descriptive column in your facts and let dimensions carry them. It sounds strict, but the payoff is filters that work, models that load fast, and measures that stop tripping over themselves.

Now that we’ve sorted the cast into facts and dimensions, there’s a twist waiting. Microsoft insists you treat each side differently: slim facts, chunky dimensions. Sounds like a paradox. But there’s a reason for it—and that’s where we’re heading next.

Normalize the Fact, Flatten the Dimension

Normalize the fact, flatten the dimension. That’s the rule Microsoft keeps drilling into us, and once you see it in practice, it makes sense. Facts are meant to stay lean and normalized. Dimensions? They’re meant to carry all the descriptive weight in one flattened place. Get that wrong, and your filters stall out while memory usage balloons.

Start with fact tables. These are your transaction logs—each row an actual event: a purchase, a return, a shipment. What belongs inside? Keys that link to dimensions, plus numeric measures you can aggregate. That’s it. Every time you toss in descriptive fields—like customer names, product categories, or vendor addresses—you’re bloating the table. Think about a sales table with 100 million rows. If you stick the phrase “Blue Running Shoe, Men’s” in there, congratulations, you’ve just written it 100 million times. That’s not modeling—that’s landfill. All that repetition steals storage, slows queries, and forces the engine to grind through useless text.

So the move is normalization. Pull that descriptive data out. Replace it with surrogate keys, then park the real attributes in a dimension table where they only live once. Power Query even gives you a direct button for this: Add Index Column. That index becomes the surrogate key for your dimension. Then you merge that key into the fact table so the relationship is one-to-many, clean and reliable. That’s Microsoft’s own guidance, and it’s the backbone of why facts behave when they’re normalized.

There’s another rule you can’t skip: pick a grain and stick to it. Grain is the detail level of your fact—per transaction, per order line, or per daily rollup. Mixing grains in one table is like throwing metric and imperial units together. Suddenly totals miss, averages skew, and you’re debugging “wrong” numbers forever. Decide the grain when you design the fact, then make sure every row follows it. The result is consistent, predictable queries that won’t surprise you mid-demo.

Dimensions take the opposite treatment: flatten them. In relational databases, you’d normalize dimensions into “snowflakes”—a product table that links to a subcategory table, which links to a category table, which finally links to a department table. That’s how someone gets tenure as a data architect. But in Power BI, that design turns your model into molasses. Why? Because every filter has to drag itself up chains of multiple tables, which increases model size, complicates queries, and forces report authors to leap across three or four lookups just to grab one attribute.

Flattening dimensions fixes that. Denormalize the hierarchy into a single table. A Product dimension includes the product name, brand, subcategory, category—all in one place. A Customer dimension carries region, state, and age group side by side. Reporting becomes simpler, because slicers grab attributes directly without climbing through links. From an author perspective, it’s one clean table instead of five confusing ones. From a performance perspective, it’s fewer joins and faster filter propagation.

That doesn’t mean snowflakes never exist, but treat them like last-resort exceptions. Maybe you’ve inherited a taxonomy that changes daily, or you’ve got shared reference data governed by another system. Fine, snowflake it. But know the trade-off: more joins, more relationships, slower filters, bigger model. Unless governance forces your hand, flatten dimensions and keep life simple.

The receipts-versus-catalog example is still the easiest way to picture this. Your facts are receipts—just IDs, quantities, and amounts, never full descriptions repeated over and over. Your dimensions are the catalog. Each product is listed with its details once, and everything points back. That balance keeps storage light, queries fast, and reports intuitive. Let receipts stay skinny. Let the catalog be thorough.

When you apply this split consistently—normalize the facts, flatten the dimensions—you line up your model with how Vertipaq and DAX were designed to work. Query scans are smaller, slicers resolve instantly, and authors don’t waste half their time hunting keys across lookup chains. That’s not style points. That’s raw performance delivered by schema discipline.

And once your facts and dimensions are behaving, the next challenge becomes obvious. Your numeric measures make sense, your categories filter cleanly, but time itself is still a mess. Without fixing that piece, you can’t even count correctly across months or years.

The Sacred Date Table

Which brings us straight to the most overlooked table in every Power BI model: the Sacred Date Table. People love skipping it because “we already have an OrderDate column.” That shortcut is a trap. A single raw column in a fact table isn’t enough. It leads to inactive relationships, clunky DAX workarounds, and broken time intelligence. If you’ve ever written a period‑to‑date calculation that gave you nonsense, odds are you tried to cheat without a proper date dimension.

Business logic lives on the calendar. Orders, invoices, shipments, churn—it all ties back to time. Microsoft’s docs don’t mince words here: build a dedicated date table. Not optional, not “nice to have.” Required. And building one is dead simple if you know the rules. Start with a continuous range that covers every date your data could touch—from the earliest transactions you care about to the latest forecasted horizon. Don’t patch holes. If you miss a gap, you’ll wonder why a chart skips March like the month never existed.

Next: add a surrogate key so this becomes a real dimension, not just another column. You can use Power Query’s “Add Index Column,” or go with a formatted yyyymmdd key. Either way, give the model a clean “one” side for relationships. That’s the glue that makes star schema queries predictable, instead of wobbling around inactive joins.

Then, stock your date table with every attribute your reports ever slice by. Year, Quarter, Month, Day, MonthName, maybe even flags like IsWeekend or IsWorkingDay. If you rely on raw dates alone, you’ll drag calculations into places they don’t belong. Authoring visuals becomes far easier when those attributes are baked into the dimension. Want a simple slicer on Month Name? It’s already there, spelled out, no hacks required.

Now let’s talk about roles. Facts usually don’t come with one date column—they come with a dozen. OrderDate, ShipDate, DueDate, PaymentDate. Trying to funnel all of those through one “Date” table is the fastest way into USERELATIONSHIP hell. Every time you want to slice on shipping versus ordering, you’re forced to juggle DAX syntax nobody wants to debug. The clean fix is role‑playing date tables. Duplicate the same dimension for each role: one as Order Date, one as Ship Date, one as Delivery Date. Each relationship stays active. Each table gets clear names: ShipYear, OrderYear, InvoiceYear. Suddenly your slicers work cleanly, and your authors stop swearing at the relationship view.

“How many duplicates are we talking?” Just a handful. And no, you shouldn’t worry about storage. Date dimensions are tiny compared to fact tables. Duplicating a few thousand rows of calendar data doesn’t touch the space cost of a hundred million line transactions. It’s cheap insurance that keeps your measures simple and your model sane. Power Query referencing queries are the preferred way to do it: build the master once, then spin off role‑playing copies. If you must, you can also generate them with calculated tables, but referencing queries keep it organized and efficient.

Do it right once and it pays dividends forever. Year‑to‑Date actually computes as Year‑to‑Date. Same‑Period‑Last‑Year isn’t a coin toss. Period‑over‑period comparisons stop breaking mid‑presentation. You don’t need a page of exotic DAX to tell filter context which column you meant—it just works. All because the calendar finally got the respect it deserves.

A date table isn’t glamorous, but it’s the backbone of every reliable report. Skip it, and no amount of clever measures will save you. Build it with a proper range, a surrogate key, and clean role‑playing copies, and you’ll never babysit inactive relationships again.

And once time is under control, the next source of user pain stares you right in the face. Your slicers. Because nothing kills confidence faster than exposing raw blanks, cryptic codes, or “flag_1” fields that only a database admin could love.

Beating Blanks, Flags, and Cryptic Codes

Blanks, flags, and cryptic codes—this is the part no one brags about fixing, but everyone notices when you don’t. You can spend weeks designing the perfect star schema and still watch the whole thing lose credibility if the slicers greet users with “flag_1” instead of something real. At that point, the issue isn’t performance, it’s trust. And people stop trusting fast when reports look like they were built for robots.

The mess comes from the source systems. Old ERPs and CRMs love storing logic as flags, abbreviations, or random codes some DBA thought was brilliant in 1998. Add in missing values—nulls for entire customer segments or blank sales channels—and suddenly your report is littered with confusion. The mistake is letting it flow straight through into visuals. If a VP clicks a drop‑down and sees “0” or “flag_2,” they’re gone. They’ll nod politely and then export raw data to Excel, which means everything you built gets sidelined.

The fix is not yelling at your users until they memorize the code table. It’s cleaning the data before it ever hits the model. Power Query is where this gets solved. Three steps, in order. First, replace nulls with something meaningful. Use “Unknown,” “Other,” or “Not Provided.” That gives gaps a proper label so people understand it’s missing, not broken. Second, decode every system flag into a business‑friendly description. Turn “flag_1” into “Customer Active = Yes.” Turn “M” into “Male.” Turn “F” into “Female.” If you have codes nobody even remembers anymore, document them once and translate them permanently. Third, hide or remove the raw technical columns after you’ve built the replacements. “Cust_Flag” and “Sales_Ind” belong in data plumbing, not front‑end slicers.

This is one of those rare cases where “cosmetic cleanup” is actually risk management. Leave complex codes visible and you guarantee mis‑clicks and wrong assumptions. The HR horror story is classic: some systems use “M” and “F” for gender. One person reads it correctly as Male and Female, another misreads “F” as Friday, and now you’ve got a headcount chart that looks like half your staff vanished on weekends. One small misinterpretation, the board questions the data, and suddenly every insight gets second‑guessed. Making labels human is how you stop that chain reaction.

But what about when you’ve got dozens of tiny flags? Customer Current, Customer Preferred, Newsletter Opt‑In, Loyalty Member, Account Suspended—you name it. Dumping each of those into the fact table turns it into a junk drawer. Every flag repeated across millions of rows, bloating storage and killing clarity. The better pattern here is what’s called a “junk dimension.” Instead of leaving those fragments scattered in facts, you bundle them together. Power Query can generate the full Cartesian product of the valid flag combinations. Give it a surrogate key with “Add Index Column,” then swap those raw flags in your fact table for the single surrogate. Result: facts stay lean, the dimension holds all the descriptive logic, and your slicers suddenly present clean, business‑language options without cluttering the model. This reduces noise, improves performance, and makes maintenance almost effortless.

When people say fix it once, this is the case study. Shape it right in Power Query, and downstream reports enforce the new clarity automatically. Every new visual, every slicer, every page—human labels by default, no extra clean‑up required. Compare that to firefighting the same data confusion every time an analyst builds a new dashboard. Do the heavy lifting once upstream.

None of these steps are optional. Replace the blanks, decode the flags, label attributes in plain business terms, and hide the system plumbing. This isn’t polish. It’s the difference between a report users believe and one they quietly bypass. When slicers match the language of the business, people stop questioning the structure and start making decisions.

And if you want to shortcut the trial‑and‑error of figuring this out, there’s a free checklist waiting at m365.show. It lays out these exact transforms so you don’t miss them, and MVPs walk through the live fixes at M365.Show if you’d rather watch instead of read.

Once you’ve beaten the blanks, flags, and cryptic codes into shape, the bigger picture shows itself. Clean models aren’t theory—they’re the guardrail between straightforward, scalable DAX and the chaos that grinds reports to a halt when the execs are watching.

Conclusion

So here’s the wrap-up. Star schema isn’t decoration, it’s the backbone. Why? Because it cleanly separates filtering and grouping in dimensions from summarizing in facts—the exact structure Vertipaq and DAX were built for. That’s why reports run faster and modeling actually stays sane.

If you only remember three things from this video, make it these: one, identify facts versus dimensions; two, normalize facts and flatten dimensions; three, give yourself a proper date table and clean up flags before they hit a slicer. Nail those, and your model will carry you for years.

Want the checklist? Subscribe to the newsletter at m365.show. And make sure to follow the M365.Show LinkedIn page for livestreams with MVPs who’ve actually fixed this stuff in the wild. Ahh, and Subscribe for the Podcast now!