How to Fine-Tune DirectLake Queries for Large Datasets in Power BI
You can optimize DirectLake queries to work better for big datasets in Power BI by using good data modeling, smart file design, and setting up your system the right way. Performance and reliability are crucial when using analytics for large organizations. DirectLake mode offers faster access and less waiting time, but you might encounter issues if it’s not properly configured. Some common challenges include:
Queries slowing down when the model size becomes too large, causing a switch to DirectQuery.
Row-level security or strict security rules that can reduce query speed.
Loading data only when needed, which can slow performance, especially with large datasets and many users.
By understanding how to optimize DirectLake queries, you can improve query speed and maintain smooth report performance.
Key Takeaways
Use a star schema and easy relationships in your data model. This helps DirectLake queries run faster and saves memory.
Set up Parquet files with the right row group sizes, partitions, and sorting. This lets Power BI read data quicker and use fewer resources.
Move data changes to the source system and use tables that are already summed up. This makes queries faster and reports load quicker.
Do not use things like row-level security or calculated columns. These can make Power BI switch to slower DirectQuery mode.
Check query speed and resource use often with Power BI tools. This helps you find problems early and keep reports working well.
DirectLake Overview
What Is DirectLake?
DirectLake is a storage mode in Power BI. It helps you work with very large datasets. You do not have to import all your data into Power BI. You can connect straight to data in OneLake using Parquet files. This saves memory and makes your reports faster.
DirectLake uses some smart tricks:
Framing: Only the metadata gets refreshed. This makes refreshes fast and uses less power.
Paging: Only the columns you need are loaded for your queries. You do not load the whole dataset.
Transcoding: Parquet data changes quickly into a format that Power BI can use.
You get speed almost like Import mode. You do not have to wait for all the data to load. DirectLake is good for big datasets that change a lot. The table below shows how storage modes are different:

Tip: DirectLake does not work with composite models or calculated columns. If you use row-level security, Power BI will switch to DirectQuery. This can make your queries slower.
Why Optimization Matters
When you use big data, you want your reports to be fast. DirectLake gives you a good start, but you still need to make DirectLake queries better. If your data model and files are not set up right, things can get slow. Sometimes, it can even switch to slower modes.
You can:
Use less memory by loading only what you need.
Make refreshes faster by updating just the metadata.
Stop slowdowns by planning your schema and relationships.
If you make DirectLake queries better, users get answers faster. Your analytics stay reliable. This matters for businesses that need up-to-date information.
Data Modeling Essentials
Star Schema Design
You can make DirectLake work better with a star schema. In a star schema, the main facts go in one big table. Smaller tables hold details and connect to the main table. This setup keeps things simple and easy to use. Power BI likes this because it lowers the number of different values in your fact tables. Fewer unique values help DirectLake use dictionary encoding. This makes queries run faster and use less power.
Tip: Microsoft and experts say to use a star schema for reports. You can avoid slow reports and hard problems by doing this. Most slow Power BI projects happen when people skip the star schema.
Star schema makes things less confusing.
It helps queries run faster.
It matches how Power BI and DirectLake handle data.
Column and Data Type Choices
Pick the best data types for your columns. Use integers instead of words when you can. Integers take up less space and make queries quicker. Try to keep columns with fewer different values. Columns with lots of unique values, like long text or IDs, slow things down. You can group data, like using months instead of exact dates, to lower the number of unique values.
Columns with fewer unique values let DirectLake scan less data.
Good data types help with compression and speed.
Do not use high-cardinality columns for partitions to keep Parquet files working well.
Note: DirectLake works best with row groups from 1 million to 16 million rows. This makes big chunks that process faster.
Relationship Management
You should be careful with relationships in your model. Try not to use many-to-many relationships. Limit bi-directional filters because they slow things down. Use single-direction relationships when you can. Remove extra data and use incremental refresh to keep your dataset small.
Pick the best storage mode for each table.
Use aggregations to sum up big tables.
Make relationships simple to avoid slowdowns.
Star schema helps you make good relationships.
Fewer tricky relationships mean faster queries.
Smaller datasets use less memory and answer faster.
Tip: Filter early and often to keep your model fast. Always plan relationships before loading your data.
Optimize DirectLake Queries
When you work with large datasets in Power BI, you need to optimize DirectLake queries for the best speed and efficiency. You can do this by focusing on how your Parquet files are organized, how you set up row groups, and how you use query folding and aggregation. These steps help you get answers faster and use fewer resources.
Parquet File Layout
The way you lay out your Parquet files has a big impact on how quickly Power BI can scan and retrieve data. You should follow these steps to make your queries faster:
Use row group sizes between 128 MB and 512 MB. This size balances how quickly you can read data and how much memory you use.
Partition your data by columns that you often use in filters, such as date, region, or customer ID. This helps Power BI skip over data you do not need.
Apply compression algorithms like Snappy for a good mix of speed and storage savings. If you need even smaller files, you can use Gzip or Brotli, but these may slow down reading.
Make sure your files are not too small. Try to keep each Parquet file between 128 MB and 1 GB. Larger files reduce the time Power BI spends managing file details.
Sort your data by columns you use most in queries. This makes it easier for Power BI to find what it needs quickly.
Use dictionary encoding for columns with few unique values. For columns with repeated or sorted data, use run-length encoding. For columns with numbers that are close together, use delta encoding.
If you need strong data management, use Delta Lake format on top of Parquet. This gives you features like ACID transactions and versioning.
Tip: V-Order sorting is turned on by default in Microsoft Fabric. This special sorting and file organization can make your queries up to 50% faster in some cases.
Here is a quick checklist for your Parquet file layout:

Row Group Size and Sorting
You can make your queries even faster by setting the right row group size and sorting your data. Compression in Parquet files works best at the row group level. If your row groups are too small, you do not get good results. If they are too large, you might use too much memory.
Aim for row groups between 1 million and 8 million rows. Larger groups help with compression and speed, but you need to check what your system can handle.
Sorting your data with v-ordering helps Power BI read only the data it needs. This sorting method comes from Power BI’s Vertipaq technology and is now used in Parquet files. It improves compression and keeps your files compatible with Delta Lake.
When you sort by columns you use often, Power BI can skip over big chunks of data that do not match your filters.
Note: The OPTIMIZE command in Fabric can merge small Parquet files into bigger ones and apply v-order sorting. This step can make your queries much faster.
Query Folding and Aggregation
You can optimize DirectLake queries by using query folding and pre-aggregation. Query folding means that Power BI pushes data transformations back to the source, like Synapse or Delta Lake. This lets the source system do the heavy work, so Power BI only handles the results.
Always try to do as many data changes as possible in your data source, not in Power BI. This keeps your queries fast and simple.
Avoid adding complex DAX calculations or calculated columns in Power BI. These can break query folding and slow down your reports.
Use pre-aggregation to create summary tables for common queries. When you do this, Power BI can answer questions from smaller tables instead of scanning huge fact tables.
You can set up multiple aggregated tables at different levels, like daily or monthly, to match your reporting needs.
Tip: Pre-aggregation can cut query times from seconds to milliseconds. But you need to refresh these tables to keep your data up to date.
Keep in mind:
Aggregations work best when your main fact table uses DirectQuery mode.
Data types in your aggregated and fact tables must match.
Managing many aggregated tables can add complexity, so plan carefully.
By following these steps, you can optimize DirectLake queries and make your Power BI reports much faster and more reliable. You will use less memory, get answers quicker, and handle larger datasets with ease.
Fallback and Configuration
Managing Fallback Modes
It is important to know how fallback works in DirectLake. Fallback happens when Power BI cannot use DirectLake mode for a query. When this happens, Power BI switches to DirectQuery mode to get data. This switch can make your reports slower.
Common triggers for fallback include:
Querying views or calculated columns.
Going over table size or capacity limits.
Tip: You can lower fallback by avoiding these triggers. Split your data into parts, use V-Order sorting for Parquet files, and keep tables under size limits. Delta tables in Fabric use V-Order by default. This helps make things faster and lowers the chance of fallback.
If fallback happens a lot, look for features or tables that are not supported. Work with your data team to make file layout and schema better.
DirectLake Settings
You can change some settings to help DirectLake work better with big datasets:
Split large tables by year or another group. This keeps each table under the 3 billion row limit.
Use conditional DAX measures to send queries to smaller tables when you can.
Add SWITCH or conditional logic in your semantic model. This lets you pick which tables Power BI uses based on filters.
Watch how queries run. Use tools like Performance Analyzer to see when fallback happens.
Make your data layout better. Store data in Parquet format and use partitions that match how users filter data.
Note: Incremental data loading only updates changed data. This saves resources and keeps reports quick.
Capacity and Guardrails
DirectLake has strict limits to keep reports working well. Each SKU has a row count limit for each table. If you go over this, Power BI may switch to DirectQuery or not load the report.

Alert: Row-level security and big tables can slow down queries or cause fallback. Do not put security keys in fact tables. This makes data bigger and slows things down.
You should check your dataset size and change capacity if needed. Scaling up can fix reports if you hit a limit. Always look at Microsoft’s documentation for the newest rules and tips.
Monitoring and Testing
Performance Tools
You can use different tools to check how DirectLake queries work in Power BI. Microsoft Fabric has tools that show how long queries take and how much memory or CPU they use. These tools help you see if your queries are slow or use too many resources. You can:
See how fast each query runs.
Check how much memory and CPU are used.
Look for slow spots or problems in your queries.
Change settings to make things faster or cheaper.
These tools help you find issues. If you notice high CPU or memory use, you might need to change your query or update when you refresh data. Dynamic Management Views (DMVs) give you live updates about how queries run and how healthy the server is. By checking wait times and resource use, you can spot and fix slow parts of your queries.
Cold, Warm, Hot Queries
DirectLake uses a smart way to keep data in memory. The first time you run a query on a column, it is called a cold query. The system loads and changes the data, so it takes longer, maybe a few seconds. If you run more queries on the same data, these are warm queries. The data stays in memory, so these queries are much faster, almost like Import mode. If you keep using the same columns, they become hot queries. Hot queries are the fastest because the data is already in memory.
If you stop using some data, it cools down. The system may remove it from memory to make room for new data. If memory gets full, less-used columns are removed, which can slow down future queries or make Power BI switch to DirectQuery. Data you use often stays hot and keeps your reports quick.
Resource Monitoring
You should always watch how many resources DirectLake uses to keep it working well. Set the DirectLakeBehavior property to DirectLakeOnly when you are building your reports. This helps you find problems before users see them. Use Microsoft Fabric’s tools to check query times and resource use. The Fabric Capacity Metrics App shows how many Capacity Units (CUs) you use. If you use a lot of CUs, you may need to get a bigger SKU. If you use fewer resources, you might be able to use a smaller SKU.
Watch how much storage you use in OneLake. Good habits, like removing old data and using partitions, help save space and keep queries fast. Fabric does not add more capacity by itself, so you must change it when needed. This way, your DirectLake queries stay fast and work well as your needs grow.
You can make DirectLake Queries work better by importing dimension tables. Use DirectQuery for big fact tables. Pre-aggregate your data in the source to help reports load faster. Many companies have made their reports up to five times quicker with these steps.

Try using advanced tools like OneLake integration and Fabric Dataflows. These can help your reports run even faster. Keep learning by checking out Power BI communities, tutorials, and expert articles. Always test your changes and tell others what you find.
FAQ
What should I do if my DirectLake queries run slowly?
First, look at your data model for any hard relationships or columns with too many different values. Try using a star schema to make things simpler. Use Performance Analyzer to see which queries are slow. Make your Parquet files better by sorting them with filters you use a lot.
How can I avoid fallback to DirectQuery mode?
You can stop fallback by not using row-level security or calculated columns. Keep your tables smaller than the row limit for your SKU. Only use features that are supported. Check your model to make sure there are no unsupported parts.
Can I use DirectLake with composite models?
No, DirectLake does not work with composite models. You have to pick one storage mode for all your tables. If you want to use composite models, choose Import or DirectQuery mode instead.
What is the best way to partition my Parquet files?
Split your files by columns you filter on a lot, like date or region. This helps Power BI skip over data you do not need. Try to keep each file between 128 MB and 1 GB for the best speed.
How do I monitor DirectLake resource usage?
You can use the Fabric Capacity Metrics App to watch memory and CPU use. Set DirectLakeBehavior to DirectLakeOnly when you test. If you see high resource use, change your capacity or make your queries better.