Stop Wasting Money: The 3 Architectures for Fabric Data Flows Gen 2

M365 Show with Mirko Peters - Microsoft 365 Digital Workplace Daily

0:00

-23:52

Stop Wasting Money: The 3 Architectures for Fabric Data Flows Gen 2

Mirko Peters - M365 Specialist

Nov 02, 2025

Transcript

Opening Hook & Teaching Promise

Somewhere right now, a data analyst is heroically exporting a hundred‑megabyte CSV from Microsoft Fabric—again. Because apparently, the twenty‑first century still runs on spreadsheets and weekend refresh rituals. Fascinating. The irony is that Fabric already solved this, but most people are too busy rescuing their own data to notice.

Here’s the reality nobody says out loud: most Fabric projects burn more compute in refresh cycles than they did in entire Power BI workspaces. Why? Because everyone keeps using Dataflows Gen 2 like it’s still Power BI’s little sidecar. Spoiler alert—it’s not. You’re stitching together a full‑scale data engineering environment while pretending you’re building dashboards.

Dataflows Gen 2 aren’t just “new dataflows.” They are pipelines wearing polite Power Query clothing. They can stage raw data, transform it across domains, and serve it straight into Direct Lake models. But if you treat them like glorified imports, you pay for movement twice: once pulling from the source, then again refreshing every dependent dataset. Double the compute, half the sanity.

Here’s the deal. Every Fabric dataflow architecture fits one of three valid patterns—each tuned for a purpose, each with distinct cost and scaling behavior. One saves you money. One scales like a proper enterprise backbone. And one belongs in the recycle bin with your winter 2021 CSV exports.

Stick around. By the end of this, you’ll know exactly how to design your dataflows so that compute bills drop, refreshes shrink, and governance stops looking like duct‑taped chaos. Let’s dissect why Fabric deployments quietly bleed money and how choosing the right pattern fixes it.

Section 1 – The Core Misunderstanding: Why Most Fabric Projects Bleed Money

The classic mistake goes like this: someone says, “Oh, Dataflows—that’s the ETL layer, right?” Incorrect. That was Power BI logic. In Fabric, the economic model flipped. Compute—not storage—is the metered resource. Every refresh triggers a full orchestration of compute; every repeated import multiplies that cost.

Power BI’s import model trained people badly. Back there, storage was finite, compute was hidden, and refresh was free—unless you hit capacity limits. Fabric, by contrast, charges you per activity. Refreshing a dataflow isn’t just copying data; it spins up distributed compute clusters, loads staging memory, writes delta files, and tears it all down again. Do that across multiple workspaces? Congratulations, you’ve built a self‑inflicted cloud mining operation.

Here’s where things compound. Most teams organize Fabric exactly like their Power BI workspace folders—marketing here, finance there, operations somewhere else—each with its own little ingestion pipeline. Then those pipelines all pull the same data from the same ERP system. That’s multiple concurrent refreshes performing identical work, hammering your capacity pool, all for identical bronze data. Duplicate ingestion equals duplicate cost, and no amount of slicer optimization will save you.

Fabric’s design assumes a shared lakehouse model: one storage pool feeding many consumers. In that model, data should land once, in a standardized layer, and everyone else references it. But when you replicate ingestion per workspace, you destroy that efficiency. Instead of consolidating lineage, you spawn parallel copies with no relationship to each other. Storage looks fine—the files are cheap—but compute usage skyrockets.

Dataflows Gen 2 were refactored specifically to fix this. They support staging directly to delta tables, they understand lineage natively, and they can reference previous outputs without re‑processing them. Think of Gen 2 not as Power Query’s cousin but as Fabric’s front door for structured ingestion. It builds lineage graphs and propagates dependencies so you can chain transformations without re‑loading the same source again and again. But that only helps if you architect them coherently.

Once you grasp how compute multiplies, the path forward is obvious: architect dataflows for reuse. One ingestion, many consumers. One transformation, many dependents. Which raises the crucial question—out of the infinite ways you could wire this, why are there exactly three architectures that make sense? Because every Fabric deployment lives on a triangle of cost, governance, and performance. Miss one corner, and you start overpaying.

So, before we touch a single connector or delta path, we’re going to define those three blueprints: Staging for shared ingestion, Transform for business logic, and Serve for consumption. Master them, and you stop funding Microsoft’s next datacenter through needless refresh cycles. Ready? Let’s start with the bronze layer—the pattern that saves you money before you even transform a single row.

Section 2 – Architecture #1: Staging (Bronze) Dataflows for Shared Ingestion

Here’s the first pattern—the bronze layer, also called the staging architecture. This is where raw data takes its first civilized form. Think of it like a customs checkpoint between your external systems and the Fabric ecosystem. Every dataset, from CRM exports to finance ledgers, must pass inspection here before entering the city limits of transformation.

Why does this matter? Because external data sources are expensive to touch repeatedly. Each time you pull from them, you’re paying with compute, latency, and occasionally your dignity when an API throttles you halfway through a refresh. The bronze Dataflow fixes that by centralizing ingestion. You pull from the source once, land it cleanly into delta storage, and then everyone else references that materialized copy. The key word—references, not re‑imports.

Here’s how this looks in practice. You set up a dedicated workspace—call it “Data Ingestion” if you insist on dull names—attached to your standard Fabric capacity. Within that workspace, each Dataflow Gen 2 process connects to an external system: Salesforce, Workday, SQL Server, whatever system of record you have. The Dataflow retrieves the data, applies lightweight normalization—standardizing column names, ensuring types are consistent, removing the occasional null delusions—and writes it into your Lakehouse as Delta files.

Now stop there. Don’t transform business logic, don’t calculate metrics, don’t rename “Employee” into “Associates.” That’s silver-layer work. Bronze is about reliable landings. Everything landing here should be traceable back to an external source, historically intact, and refreshable independently. Think “raw but usable,” not “pretty and modeled.”

The payoff is huge. Instead of five departments hitting the same CRM API five separate times, they hit the single landed version in Fabric. That’s one refresh job, one compute spin‑up, one delta write. Every downstream process can then link to those files without paying the ingestion tax again. Compute drops dramatically, while lineage becomes visible in one neat graph.

Now, why does this architecture thrive specifically in Dataflows Gen 2? Because Gen 2 finally understands persistence. The moment you output to a delta table, Fabric tracks that table as part of the lakehouse storage, meaning notebooks, data pipelines, and semantic models can all read it directly. You’ve effectively created a reusable ingestion service without deploying Data Factory or custom Spark jobs. The Dataflow handles connection management, scheduling, and even incremental refresh if you want to pull only changed records.

And yes, incremental refresh belongs here, not in your reports. Every time you configure it at the staging level, you prevent a full reload downstream. The bronze layer remembers what’s been loaded and fetches only deltas. Between runs, the Lakehouse retains history as parquet or delta partitions, so you can roll back or audit any snapshot without re‑ingesting.

Let’s puncture a common mistake: pointing every notebook directly to the original data source. It feels “live,” but it’s just reckless. That’s like giving every intern a key to the production database. You overload source systems and lose control of refresh timing. A proper bronze Dataflow acts as the isolating membrane—external data stays outside, your Lakehouse holds the clean copy, and everyone else stays decoupled.

From a cost perspective, this is the cheapest layer per unit of data volume. Storage is practically free compared to compute, and Fabric’s delta tables are optimized for compression and versioning. You pay a small fixed compute cost for each ingestion, then reuse that dataset indefinitely. Contrast that with re‑ingesting snippets for every dependent report—death by refresh cycles.

Once your staging Dataflows are stable, test lineage. You should see straight lines: source → Dataflow → delta output. If you see loops or multiple ingestion paths for the same entity, congratulations—you’ve built redundancy masquerading as best practice. Flatten it.

So, with the bronze pattern, you achieve three outcomes: physicists would call it equilibrium. One, every external source lands once, not five times. Two, you gain immediate reusability through delta storage. Three, governance becomes transparent because you can approve lineage at ingestion instead of auditing chaos later.

When this foundation is solid, your data estate stops resembling a spaghetti bowl and starts behaving like an orchestrated relay. Each subsequent layer pulls cleanly from the previous without waking any source system. The bronze tier doesn’t make data valuable—it makes it possible. And once that possibility stabilizes, you’re ready to graduate to the silver layer, where transformation and business logic finally earn their spotlight.

Section 3 – Architecture #2: Transform (Silver) Dataflows for Business Logic & Quality

Now that your bronze layer is calmly landing data like a responsible adult, it’s time to talk about the silver layer — the Transform architecture. This is where data goes from “merely collected” to “business‑ready.” Think of bronze as the raw ingredient warehouse and silver as the commercial kitchen. The ingredients stay the same, but now they’re chopped, cooked, and sanitized according to the recipe your organization actually understands.

Most teams go wrong here by skipping directly from ingestion to Power BI. That’s equivalent to serving your dinner guests raw potatoes and saying, “Technically edible.” Silver Dataflows were built to prevent that embarrassment. They take the already‑landed bronze delta tables and apply logic that must never live inside a single report — transformations, lookups, and data quality enforcement that define the truth for your enterprise.

The why is simple: repeatability and governance. Every time you compute revenue, apply exchange rates, map cost centers, or harmonize customer IDs, you should do it once — here — not 42 times across individual datasets. Fabric’s silver architecture gives you a single controlled transformation surface with proper lineage, so when finance argues with sales about numbers, they’re at least arguing over the same data shape.

So what exactly happens in these silver Dataflows? They read delta tables from bronze, reference them without re‑ingestion, and perform intermediate shaping steps: joining domains, deriving calculated attributes, re‑typing fields, enforcing data quality rules. This is where you introduce computed entities, those pre‑defined expressions that persist logic rather than recomputing it every refresh. Your payroll clean‑up script, your CRM de‑duplication rule, your “if‑customer‑inactive‑then‑flag” transformations — all of these become computed entities inside linked Dataflows.

Fabric Gen 2 finally makes this elegant. Within the same workspace, you can chain Dataflows via referenced entities; each flow recognizes the other’s output as an upstream dependency without duplicating compute. That means your silver Dataflow can read multiple bronze tables — customers, invoices, exchange rates — and unify them into a new entity “SalesSummary,” while Fabric manages lineage automatically. No extra pipelines, no parallel refreshes, just directed acyclic bliss.

Let’s revisit that because it’s the most underrated change from Power BI: linked referencing replaces duplication. In old‑school Power Query or Gen 1 setups, every Dataflow executed in isolation. Referencing meant physically copying intermediate results. In Gen 2, referencing is logical. The transformation reads metadata, not payloads, unless it truly needs to touch data. The result? Fewer refresh cycles and up to an order‑of‑magnitude reduction in total compute time. Or, to translate into management English, “the credit card bill goes down.”

Another important “why”: quality. Silver is where data is validated and tagged. Use this layer to enforce semantics — ensure all dates are in UTC, flags are boolean instead of creative text, and product hierarchies actually align with master data. It’s where you run deduplication on customer tables, parse malformed codes, and fill controlled defaults. Once it passes through silver, downstream consumers can trust that data behaves like adults at a dinner table: minimal screaming, consistent manners.

There’s a critical governance side too. Because silver Dataflows run under shared workspace rules, editors can implement business logic but not tamper with raw ingestion. This separation of duties protects bronze from accidental “Oh, I just cleaned that column” heroics. When compliance asks for lineage, Fabric shows the full path — source to bronze to silver to gold — proving not just origin but transformation integrity.

Common mistake number one: hiding your business logic inside each Power BI dataset. It feels faster. You get that instant dopamine when the visual updates. But it’s also a governance nightmare. Every time you rebuild a measure or a derived field inside a report, you replicate transformations that should live centrally. Then someone updates the definition, half the reports lag behind, and before long your “Total Revenue” doesn’t match across dashboards. Centralize logic in silver once, reference it everywhere.

Here’s how: inside your silver workspace, create linked Dataflows pointing directly to bronze delta outputs. In each, define computed entities for transformations that need persistence, and regular entities for on‑the‑fly shaping. When you output these, write again to delta in the same Lakehouse zone under a clearly labeled folder, like “/silver” or “/curated.” Those delta tables become your corporate contract. Notebooks, semantic models, Copilot prompts — all of them read the same truth.

Performance‑wise, you gain two tools: caching and chaining. Cache intermediate results so subsequent refreshes reuse pre‑transformed partitions. Then, schedule chained refreshes — silver only runs when bronze completes successfully. This cascades lineage safely without one layer hammering compute before the previous finishes.

And yes, you still monitor cost. Silver is heavier than bronze because transformations consume compute, but it’s orders of magnitude cheaper than each report reinventing the logic. You’re paying once per true transformation, not per visualization click. Fabrically efficient, you might say.

Once silver stabilizes, your world gets calm. Data quality disputes drop, refresh windows shrink, and notebooks start reading curated tables instead of untamed source blobs. You’ve turned data chaos into a reliable service layer. Which brings us neatly to the top of the hierarchy — the gold architecture — where the goal stops being “prepare data” and becomes “serve it instantly.” But before we dive into that shiny part, remember: the silver layer is where your business decides what truth means. Without it, gold is just glitter.

Section 4 – Architecture #3: Serve (Gold) Dataflows for Consumption

Now we’ve arrived at the gold layer—the part that dazzles executives, terrifies architects, and costs a fortune when misused. This is the Serve architecture, the polished surface that feeds Power BI, notebooks, Copilot prompts, and any other consumer that insists on calling itself “real‑time.” Think of bronze as the warehouse, silver as the production line, and gold as the storefront window where customers stare at results. It’s beautiful, but only if you keep the glass clean.

The purpose of the gold pattern is different from the first two layers. We’re not cleaning, we’re not transforming; we’re exposing. Everything here exists to make curated data instantly consumable at scale without triggering a parade of background refreshes. The silver layer has already created governed, standardized delta tables. Gold takes those outputs and serves them through structures designed for immediate analytical use—Direct Lake semantic models, shared tables, or referenced entities inside a reporting workspace.

Why bother isolating this as a separate architecture? Because consumption patterns are volatile. The finance team might query hourly; operations, once a day; Copilot, every few seconds. Mixing that behavior into transformation pipelines is like inviting the public into your kitchen mid‑service. You separate the front‑of‑house (gold) so that the serving load never interferes with prep work.

Let’s break down the mechanics. In a gold Dataflow Gen 2, you don’t fetch new data; you reference silver delta outputs. Those already live in the Lakehouse, so every consumer—from a semantic model to a notebook—can attach directly without recomputation. Configure each Dataflow table to publish delta outputs into a dedicated “/gold” zone or into Lakehouse shortcuts that point back to the curated silver tables. Then create semantic models in Direct Lake mode. Why Direct Lake? Because Fabric skips the import stage entirely. Reports visualize live data residing in the Lakehouse files; no scheduled dataset refresh, no redundant compute.

That’s the secret sauce: data freshness without refresh penalties. When silver writes new partitions to delta, Direct Lake consumers see those changes almost instantly. No polling, no extra read cost. What you gain is near‑real‑time insights with the compute footprint of a mosquito. This is precisely how Fabric closes the loop from ingestion to visualization.

Of course, humans complicate this. The fashionable mistake is to duplicate gold outputs inside every department’s workspace. “We’ll just copy these tables into our project.” Wonderful—until your storage map looks like a crime scene. Every duplicate table consumes metadata overhead, breaks lineage, and undermines the governance story silver so carefully built. Instead, expose gold outputs centrally. Give each consumer read rights, not copy rights. Think of it as museum policy: admire the exhibit, don’t take it home.

Another error: embedding all measures directly in reports. While Direct Lake enables this, governance does not. Keep core metrics—like gross margin or lead conversion rate—defined in a shared semantic model that references those gold tables. That ensures consistency when Copilot, Power BI, and AI notebooks all ask the same question. Write the logic once, propagate everywhere. Dataflows Gen 2 make that possible because the gold layer’s lineage is visible; it lists every consumer by dependency chain.

Now, performance. Gold exists to minimize latency. You’ll get the fastest results when gold Dataflows refresh only to capture new metadata or materialize views, not to move entire payloads. Schedule orchestrations centrally—have your silver flows trigger gold completion events instead of time‑based refreshes. That way, when new curated data lands, gold models are instantly aware, but your capacity isn’t hammered by hourly refresh rituals invented by nervous analysts.

From a cost perspective, gold actually saves you money if built correctly. Compute here is minimal. You’re serving cached, compressed delta files via Direct Lake or shared endpoints, using metadata rather than moving gigabytes. The only expensive thing is duplication. The moment you clone tables or trigger manual refreshes, you revert to bronze‑era economics—lots of compute, no reason.

Real‑world example: a retail group builds a gold layer exposing “SalesSummary,” “StorePerformance,” and “InventoryHealth.” All Power BI workspaces reference those via Direct Lake. One refresh of silver updates the delta files, and within minutes, every dashboard shows new numbers. No dataset refresh, no duplication. Copilot queries hit those same tables through semantic names, answering “What’s yesterday’s top‑selling region?” without any extra compute. That’s the promise of properly served gold.

Let’s pause for the dirty secret. Many teams skip gold entirely because they think semantic models inside Power BI are the gold layer. Close, but not quite. Models describe relationships; gold defines lineage. If your semantic model pulls from direct delta references without an intervening gold layer, you lose orchestration control. Gold isn’t optional; it’s the governor that enforces how consumption interacts with data freshness.

So, how do you ensure discipline? Designate a reporting workspace explicitly for gold. Only that workspace publishes entities marked for consumption. Silver teams own upstream Dataflows; gold teams manage access, schema evolution, and performance tuning. When an analyst requests a new metric, they add it to the shared semantic model, not as a freelance measure in someone’s report. That separation keeps refresh logic unified and prevents “rogue marts.”

The result: you build a self‑feeding ecosystem. Bronze lands data once, silver refines it once, gold shares it infinitely. New data flows in, semantic models light up, Copilot answers questions seamlessly—and your compute bill finally stops resembling a ransom note.

At this stage, you’re no longer treating Fabric like Power BI with extra buzzwords. You’re designing for scale. The gold architecture is the payoff: minimal movement, maximal consumption. And when someone proudly exports a CSV from your flawless gold dataset, just smile knowingly. After all, even the most perfect architecture can’t cure nostalgia.

Section 5 – Choosing the Right Architecture for Your Use Case

Now that we’ve mapped bronze, silver, and gold, the inevitable question surfaces: which one should you actually use? Spoiler alert—probably not all at once, and definitely not randomly. Picking the wrong combination is how people turn an elegant lakehouse into a tangled aquarium of redundant refreshes. Let’s run the calculations like adults.

Think of it as a cost‑to‑intelligence curve. Bronze buys you cheap ingestion. You land data once, nothing fancy, but you stop paying per refresh. Silver strikes the balance—moderate compute, strong governance, steady performance. Gold drops the latency hammer: instant access, but best used for curated outputs only. So, bronze equals thrift, silver equals order, gold equals speed. Choose based on which pain hurts more—budget, control, or delay.

Start with a small team scenario. You’ve got five analysts, one Fabric capacity, and governance that’s basically “whoever remembers the password.” Don’t over‑engineer it. Build a single bronze Dataflow for ingestion—maybe finance, maybe sales—then a thin silver layer applying essential transformations. Serve from that silver output directly through Direct Lake; you don’t need a whole separate gold workspace yet. Your goal isn’t elegance; it’s cost sanity. Set incremental refresh, monitor compute, evolve later.

Next, an enterprise lake setup—multiple domains, dozens of workspaces, regulatory eyes watching everything. You need the full trilogy. Bronze centralizes ingestion across domains; silver handles domain transformation with data contracts—each team owns its logic layer; gold creates standardized consumption zones feeding Power BI, AI, and external APIs. Govern lineage and refresh orchestration centrally. And yes, this means three capacities, properly sized, because saving pennies on compute while violating compliance is not “efficient.” It’s negligent.

Third, the mixed‑mode project. This is most of you—half the work still experimental, half production. In that world, start with bronze + silver under one workspace for agility, but expose key outputs through a minimalist gold workspace dedicated to executive reporting. Essentially, two layers for builders, one layer for readers. It’s the starter pack for responsible scaling. Once patterns stabilize, split workloads for cleaner governance.

Here’s the universal rule—never mix ingestion and transformation inside the same Dataflow. That’s like cooking dinner in the same pan you use to fetch water: technically possible, hygienically disastrous. Keep bronze Dataflows purely for extraction and landing; create silver ones that reference those outputs for logic. You’ll thank yourself when lineage diagrams actually make sense and capacity doesn’t melt during refresh peaks.

Governance isn’t an optional layer either. Leverage Fabric monitoring to measure throughput: capacity metrics show CPU seconds per refresh, and lineage view exposes duplicate jobs. When you see two flows pulling the same source, consolidate them. Spend compute on transformation, not repetition. Define workspace access by role—bronze owners are data engineers, silver curators handle business rules, gold publishers manage models and permissions. Division of duty equals reliability.

Scalability follows the Lakehouse governance model, not the old Power BI quotas. That means refresh throttling is gone; compute scales elastically based on workload. But elasticity costs money, so measure it. You’ll discover most waste hides in uncoordinated bronze ingestions quietly running every hour. Adjust schedules to business cycles and cache partitions cleverly. Efficiency is less about hardware and more about discipline.

In short, architecture is the invisible contract between cost and comprehension. If you want agility, lean bronze‑silver. If you want consistency, go full tri‑layer. Whatever you choose, document lineage and lock logic centrally. Otherwise, your so‑called modern data estate becomes an expensive déjà vu machine—refreshing the same ignorance daily under different names.

You’ve got the triangle now: bronze for landing, silver for logic, gold for serving. Stop pretending it’s optional—Fabric runs best on this mineral diet. Which brings us, appropriately, to the closing argument: why all this matters when the spreadsheet loyalists still cling to CSVs like comfort blankets.

Conclusion & Call to Action

So here’s the compression algorithm for your brain: three architectures, three outcomes. Bronze stops data chaos at ingestion, silver enforces business truth, gold delivers instant consumption. Together, they form a compute‑efficient, lineage‑transparent foundation that behaves like enterprise infrastructure instead of dashboard folklore.

Ignore this design, and your project becomes a donation program to Microsoft’s cloud division—double compute, perpetual refreshes, imaginary governance. You’ll know you’ve gone rogue when finance complains about costs, and your diagrams start looping like spaghetti. Structure is cheaper than chaos.

If you remember only one sentence, make it this: Stop building Power BI pipelines in Fabric clothes. Because Fabric isn’t a reporting tool—it’s a data operating system. Treat it like one, and you’ll outscale teams ten times larger at half the cost.

Next, we’ll dissect how to optimize Delta tables and referential refresh—the details that make all three architectures hum in perfect latency harmony. Subscribe, enable notifications, and keep the learning sequence alive.

Because in the end, efficiency isn’t luck—it’s architecture done right.