Optimizing Lakehouse Performance in Microsoft Fabric

M365 Show with Mirko Peters - Microsoft 365 Digital Workplace Daily

0:00

-20:49

Optimizing Lakehouse Performance in Microsoft Fabric

Mirko Peters - M365 Specialist

Aug 12, 2025

If you’ve ever watched a simple query crawl in Microsoft Fabric—while your cloud bill climbs—you’re in the right place. Today, we’ll rewind to that moment of frustration and show you a real transformation: before and after Lakehouse optimization using partitioning, caching, and Delta Lake file management.
What actually fixes sluggish performance, and what should you be doing differently right now? Let’s break down exactly how tuning these settings can speed up your analytics—and put those wasted costs back in your budget.

When Good Queries Go Bad: The Pain of a Slow Lakehouse

If you’ve ever watched a simple Power BI dashboard lag for what feels like forever—even though your data Lakehouse ticks all the supposed best practices boxes on paper—you’re not alone. Most admins and data engineers have faced this painful moment. You pull up what should be a plain routine report, and that spinning wheel just won’t quit. The more you wait, the more awkward the silence gets, especially if you’re sitting across the table from business leaders who expect crisp, on-demand answers. You can have all the right connectors, a shiny Fabric workspace, and every account set up perfectly, yet when that dashboard grinds to a halt, everyone looks at the data team like, “Didn’t we buy this to make things faster?”

It’s a scenario that plays out in organizations of all shapes and sizes. Picture a live business review, where decision-makers toss out new questions and expect instant insights. But this meeting crawls along because every fresh question turns into a stonewall—your team nervously refreshes dashboards, and the queries just drag. Someone tries to defend the delays: “It worked fine yesterday,” or “Maybe it’s a network thing?” But by this point, the moment is lost and confidence in your Lakehouse—supposedly built to deliver analytics at scale—takes a hit. Underneath it all, you know the answer isn’t about more RAM, a faster network, or praying to the Microsoft gods. Something deeper in the setup is off.

That’s a frustrating reality check for a lot of folks using Microsoft Fabric. You did all the homework—read the documentation, clicked through the intro guides, even went through official tutorials step-by-step. You’re investing not just budget but time and credibility into this thing. Yet the speed is crawling and your usage costs keep climbing with every refresh. The business side starts asking pointed questions about resource usage. “We’re using the cloud, shouldn’t it be faster?” That budget for extra analytics headcount? It’s slowly evaporating into compute charges for reports no one is happy with.

The most common trap looks deceptively simple: a basic sales report, running over a Lakehouse table that looks organized but isn’t set up to scale. Maybe it’s partitioned by the default ‘date’ column, maybe it’s just one big file. You press refresh, and suddenly the query engine has to read through a mountain of irrelevant data to get to those two weeks of sales your VP actually cares about. The table was probably ingested with one-size-fits-all defaults. Maybe it’s built from flat files, all crammed into the same folder or, worse, written out with every new ETL load in a single drop. None of these mistakes jump out at you when you’re small, but once you hit any kind of scale, reports that should finish in seconds can take minutes—or worse, never finish at all.

It’s a pattern that keeps showing up: teams stick with out-of-the-box settings, avoid rethinking ingestion, and put off cleaning up their data structure because everything sort of “works” during initial testing. Add some growth and a few extra users and suddenly you’re just spinning wheels. Microsoft’s analytics team actually studied this and found something wild—up to 80% of Lakehouse performance issues came from the same handful of missteps baked in right at setup. We’re talking about stuff that never gets fixed because it’s invisible until real usage hits. The result? Your Lakehouse is like a race car stuck with the parking brake on. You’re throwing money at bigger engines—more compute, maybe even another consultant to run some scripts—but you still aren’t getting anywhere near the performance you expected.

What if you could pinpoint just three moves that unlock double-digit speed improvements—not by throwing more hardware or budget at the problem, but by tuning features you already have? Imagine the difference: instead of burning through your cloud budget on slow queries, you see instant refreshes and happy users. Those cost spikes flatten out, and surprise tickets about slow dashboards get a lot less frequent. For a lot of Lakehouse admins, that leap in performance is completely within their reach—all it takes is knowing where the slowdowns start, and how to fix them without tearing everything apart.

So yes, the pain is real—those slow dashboards, awkward meetings, and climbing costs add up fast. But there’s nothing mysterious or magical about the solution. Over the next few minutes, we’ll break down exactly what’s holding your Fabric Lakehouse back and how to flip the script, step by step. Because whether it’s partitioning strategies, caching, or the files themselves, there’s a fix for every common bottleneck if you know where to look.

Let’s get into it and tackle the very first pitfall that trips up almost every Fabric Lakehouse out there—partitioning strategies that sabotage you right from the start.

Partitioning Pitfalls: The Hidden Cost of One-Size-Fits-All

You might think you’ve got partitioning handled—set up some keys, run your ingestion, and move on. But Fabric has a way of exposing every assumption about data splits. Most admins default to partitioning by dates or regions, making a guess based on how tables were modeled or what made sense for last year’s reporting structure. There’s comfort in picking a field like ‘date’ and letting the system slice up everything for you—especially if you’re used to working on-prem or with more traditional warehouse tools. The real problem starts when that habit collides with the way your teams actually use the data.

Fabric treats partitions differently under the hood. A partition can help, but when it’s picked for convenience rather than workload, it often becomes a liability. Imagine a sales table split into one folder per year, stretching all the way from 2005 to today. But here’s the catch: 90% of your queries filter not by year, but by ‘product category’. So when someone asks to see all sales for a single product, the engine has to load every year, every partition, just to find the relevant records. What should’ve been a targeted scan turns into a painfully broad search. That’s not just wasted time—it’s an avoidable cost every single month.

There’s another side to this story, too. It’s easy to go overboard on partitioning, especially when there’s a fear of slow jobs or overloaded clusters. Say you try to cover every angle and create much finer partitions: maybe one per day, per region, per product category, per sales channel—so many combinations that you end up with tens of thousands of tiny files scattered in your Lakehouse. Each one takes up storage space, each one adds an overhead for the engine to scan, validate, and open. Instead of narrowing the work, you’ve just multiplied it. Fabric’s storage costs start to creep, and what used to be a single-table read becomes a coordination nightmare of hunting down fragmented rows.

That’s the story most teams live out at least once: under-partition and you force massive, expensive scans; over-partition, you drown yourself in metadata and drag everything down to a crawl. Tracking down the sweet spot isn’t just nice to have—it’s essential for sustainable performance and predictable costs. The difference can be huge. Poorly chosen partition keys can double query times and push costs up by as much as 60%. It adds up fast, especially for organizations with growing data and regular refreshes. Storage isn’t free, and neither is time spent by your team fixing slowdowns that never should have existed in the first place.

Here’s the real kicker: default partitioning patterns might look familiar, but they’re not built for your actual business questions. Microsoft’s own data architects put it bluntly—‘default’ partitioning is usually the wrong answer for custom workloads. Their point is simple: if you always split by date because the documentation suggests it, you’re almost guaranteed to hit performance walls when your queries don’t follow that same logic. The documentation can get you started, but it won’t know how your business users think, or what questions come up in real meetings.

Not long ago, I worked with a client whose team inherited a Lakehouse built entirely around partitions by ‘region’. Every table was split that way, top to bottom. It seemed logical—the business had always organized sales targets by region. But in practice, nearly all their dashboards filtered and aggregated by ‘order_type’. Regional splits meant that every refresh and ad hoc query had to grab pieces from all over the place, scan unnecessary files, and reassemble everything on the fly. Performance dropped off a cliff anytime there was a spike in questions. The fix? They rebuilt partitions around ‘order_type’ instead. Night and day difference. Not only did the refresh window shrink to half its original length, but their query costs stabilized and predictable performance returned. The only thing they changed was how data lined up with what end users actually asked for in real life.

If there’s one lesson from all this, it’s that partition keys should never be on autopilot. The best choice always comes from your workload and query patterns, not the data model itself or what makes a table look neat on disk. Before you launch that next ETL pipeline, ask when and how the table is queried—not just how it will be written. Sit with your business analysts, check historical query logs, and figure out which columns they filter on. That’s where the gains are hiding. A partition key that matches those filters will do more for speed, cost, and sanity than any amount of hardware upgrades.

It’s not enough to stop at partitioning, though. Even perfectly chosen keys can only get you so far if the data behind them is being pulled and processed inefficiently. That’s why, after cleaning up partitioning strategies, smart teams look to the next big lever: caching. If you’ve ever wondered why some Lakehouses still drag even when partitions line up perfectly, it’s usually because their hottest data isn’t ready in memory when it counts. Caching, when tuned right, keeps Lakehouses snappy and responsive even during peak workflows. Let’s get into how you can put memory to work, not just storage, and see what happens when caching is treated like an active part of your Lakehouse toolkit.

Cache or Crash: Why Memory Matters More Than You Think

Caching seems simple—click it on, let Fabric handle the rest, and expect queries to fly. But then your supposedly “optimized” Lakehouse still drags on those important dashboards, leaving you staring at a loading icon instead of insights. It's a problem most Fabric admins bump into sooner or later: everything’s enabled, but performance is stuck. What gives? Underneath the surface, Fabric has multiple caching layers in play—like result set cache, warehouse cache, and even individual file-level caching—but each behaves a little differently. The result is that some layers quietly do their job while others just eat up memory without ever helping the queries you actually care about. If you don’t control what goes where, your prized data gets treated the same as yesterday's quick test uploads that no one looks at twice.

Let’s take a step back and talk about why, even with everything technically “cached,” speed drops off for real workloads. The big issue is that default settings favor fairness rather than what’s actually hot, so you get a generic even split. Your most important data—the tables everyone needs at 9AM Monday—often gets pushed behind random, older chunks the engine just happened to fetch first. It’s like storing your favorite screwdriver at the bottom of a cluttered toolbox. You always end up fishing around, moving useless stuff aside, even though you know exactly what you want. The wasted time isn't from hardware limits, it’s from bad prioritization.

There’s a story I see play out again and again: the monthly board review, a Power BI dashboard that pulls from detailed Lakehouse tables. The data isn’t massive, but every exec is expecting real-time refreshes. Yet, every cycle, the same key tables are slow to load—meanwhile, logs show that five other, rarely queried tables are sitting snug in RAM taking up cache space, just because they happened to be loaded earlier. The report’s source tables are nowhere near the front of the cache line, so every refresh chugs along, reading from disk instead of memory. The strange part is that if you run that same query right after, it’s quick—because now that data is finally in cache… until something else boots it out again. All this happens behind the curtain, and unless you actively manage what’s hot, you’ll be riding this performance rollercoaster for every major workflow.

The right approach is tuning cache for the data that actually matters. Microsoft-backed studies have shown that just by targeting your “hot” tables—meaning the ones used most by your reports or dashboards—you can see a 70% or greater drop in query times. That’s not a small tweak; that’s the difference between reports everyone complains about and ones they actually use without thinking twice. It’s never just about enabling caching and calling it a day. It means reviewing query history, monitoring which tables see the most load, and building a plan that gets those top performers locked in for the busiest hours. Sometimes that’s done via manual pinning—telling Fabric exactly which data stays resident in RAM. Other times, it’s fine to let automation handle less critical or rarely accessed data, letting Fabric’s own predictive logic make the call.

But automation also has its downsides. Auto-caching works well for straightforward, stable workloads where patterns rarely change. If your access patterns jump around or peak at odd times—say, end of month or during big product launches—default logic can miss the mark badly. That’s when a manual “cache pin” becomes worth every second it takes. Pinning is basically a power move: you tell Fabric, “this table never leaves cache during business hours, no matter what.” You might only need it for a handful of tables out of dozens, but those will be the ones driving every critical dashboard refresh. It’s the difference between spending all day fielding complaints and having users who barely notice a hiccup.

There are plenty of cases out there where this simple shift—manual cache intervention—has changed the game. One team I worked with recently struggled for months with random, unpredictable slowdowns—usually right when everyone was running their big Monday morning analysis. They finally dug into caching patterns and found their key report tables ranked low on the auto-cache algorithm. By manually promoting those, they cut their compute costs by nearly a third and, even more importantly, stopped the weekly flood of “why is this taking so long?” tickets. Compute stabilized, budgets recovered, people stopped complaining, and suddenly, IT had extra time for improvement projects instead of damage control.

Another misconception is that more cache always equals more speed. In reality, over-caching irrelevant data just fills up expensive memory with stuff that never gets queried. Fabric memory is not unlimited (and it isn’t free). Spend too much of it on cold storage, and your real-time workloads are left scraping disk. That’s why cache management in Fabric is an ongoing task, not something you flip on and ignore. The right strategy means reviewing reports, keeping up with new usage patterns as the business evolves, and tweaking priorities to match what’s actually happening—not just what you planned for at rollout. This regular tuning pays dividends: faster dashboards, steadier budgets, and a smoother user experience across the board.

But we haven’t talked about the underlying files yet. Even with partitions tuned and everything perfectly cached, your Lakehouse can still get bogged down if those files are too fragmented or bulked up with junk. When it comes to file optimization and compaction, all the caching in the world won’t rescue you from a mess of small or bloated files. That’s where switching gears to the right file format—and knowing how to manage it—becomes the next major lever for performance.

Delta Lake Secrets: Why Compaction Is Your Secret Weapon

If you’ve ever felt like your Lakehouse suddenly picked up a limp as your data grew, you’re not imagining it. Even with partitioning and caching perfectly dialed in, performance can quietly erode—and the cause hides in the files themselves. That’s the challenge with Delta Lake tables in Microsoft Fabric: every operation you run—whether it’s loading daily sales, running a batch update, or fixing an import edge case—adds more files. At first, spreading data into multiple parquet files looks clean. But over time, all those tiny slices pile up. What started as a handful of manageable files morphs into hundreds, then thousands, of discrete little blocks scattered across storage. It’s basically a digital junk drawer—one with a habit of slowing you down when you least expect it.

Here’s where things often go sideways. Each fresh batch, incremental update, or upsert brings its own new set of files. The Lakehouse doesn’t “replace” old ones, it just adds to the stack. Fast forward a few weeks, and your production sales table, which might have started out neat and tidy with 50 files, is now sitting at 5,000. At this scale, what used to be a quick scan has turned into a slog. Every query forces the engine to hunt through thousands of fragments. Instead of pulling data from a couple of well-organized chunks, it spends resources coordinating across a mess of tiny files. That overhead is real. Query times double, then triple—not because your data got so much bigger, but because the engine now has to work harder just to organize the reading process.

I see this pattern in mature Lakehouse projects all the time. Teams spend weeks tuning partitions and cache, only to miss the hidden latency caused by file sprawl. And once those file counts hit a tipping point, almost everything that touches the table starts to slow down. The cost is more than just time: every unnecessary file means more IOPS, more metadata operations, and ultimately more money burned both on compute and cloud storage. Left unchecked, it turns even simple reporting into a grind.

Microsoft’s engineers have numbers to back this up. Their analysis finds that compaction—grouping small files together into larger ones—can cut read latency by as much as 80%. That means not only faster dashboards and analytics, but a drop in the unpredictable cloud bills that come from excessive file handling. Compaction isn’t just nice to have; it’s a core requirement once your tables start to accumulate regular updates and small incremental loads. Less fragmentation per query means less confusion for the read engine, translating into tangible results for users every time.

But the benefits don’t stop at query speed. Storage consumption drops as duplicate data and old fragments get merged. Your backup routines get quicker, since there’s less metadata overhead. Even your data engineers notice fewer “mystery” slowdowns, since most of those come straight from reading too many files at once. Compute usage shrinks, freeing up resources for real workloads instead of endless file juggling. It’s a change you see in cloud bills and user tickets at the same time.

The snag is that users often try to automate compaction on a calendar—daily, weekly, first of the month, whatever’s easiest to ship in a pipeline job. But the real world is rarely that neat. Triggers based strictly on time ignore what’s actually happening inside your tables. Sometimes your Lakehouse gets a rush of writes in a single day, other times hardly anything at all. Microsoft recommends compaction based on data conditions—not on a clock—by tracking file sizes and growth patterns instead. This keeps the compaction process lean, avoiding wasted effort compacting when it’s not needed and jumping in quickly when tiny files start to pile up. You save cycles, avoid redundant operations, and keep your Lakehouse smooth with minimum fuss.

A good real-world example comes from a finance team I worked with. They had a large Delta Lake table refreshed in small batches every hour, 24/7. The file count quietly ballooned from a few hundred to several thousand within weeks. Their Power BI refreshes, once a routine two-minute task, crept to ten or more. The problem wasn’t compute or RAM—it was file sprawl. Once compaction was automated using logic that watched file counts and average size, the reporting time dropped from ten minutes back to under two. No hardware upgrade, no schema change—just cleaning up the files nobody noticed growing in the background.

Optimizing compaction is the final major lever for Fabric Lakehouse performance, and it doesn’t require risky, disruptive changes. It’s about understanding table activity, knowing when bloat is building, and acting with purpose—not just following a calendar or hoping the engine keeps up. Nail your file management, and suddenly, your partitioning and cache investments actually pay off. Ignore it, and you’ll never get the speed or cost savings promised on paper.

Now, as the Lakehouse world gets faster and bigger, the real opportunity sits in linking all these levers together—moving from piecemeal fixes to a setup where partitioning, caching, and file compaction actually reinforce each other. That’s where the biggest performance payoff shows up, and it’s the next obvious step for teams tired of living with slow queries and ballooning costs.

Conclusion

You’re not stuck with slow queries just because you picked Microsoft Fabric. Most performance issues tie back to a few overlooked parts of the setup—partitioning chosen for convenience, cache left on autopilot, and all those un-compacted Delta files quietly building up. Speed isn’t about luck or buying more compute; it’s about weaving together these levers so they work as a system. Before you rack up more cost blaming the engine, look at the table patterns, hot data, and file health. Start by picking one area to test. Once you see results, let everyone know what actually changed for you.