How to Use Fuzzy Merge in Power Query for Similarity-Based Data Matching
If you ever need to match up records that almost—but don’t quite—look the same, Fuzzy Merge makes it easy. You can join customer or department data, even if the text entries have typos or different formats. For example, a marketing team once merged over 2,000 duplicate CRM records in just 15 minutes, and a financial firm processed a million records in under an hour, all thanks to Fuzzy Merge. You’ll find this tool right inside Power Query, Power BI Desktop, and data flows—ready to handle all those messy, real-world lists.
Key Takeaways
Fuzzy Merge helps match data that is similar but not exactly the same, fixing typos and format differences easily.
Clean your data first by trimming spaces and standardizing text to improve match accuracy.
Adjust the similarity threshold to control how strict or loose your matches are, balancing accuracy and coverage.
Use options like ignoring case, ignoring spaces, and transformation tables to handle tricky or special cases.
Review your results carefully and tweak settings to reduce missed matches and false positives for the best outcome.
Fuzzy Merge Overview
What Is Fuzzy Merge
Fuzzy Merge is a smart way to match data that isn’t exactly the same but is close enough. You might have two lists—maybe customer names or department titles—that look similar but have small differences. Fuzzy Merge helps you connect these records, even if someone made a typo or used a different format. Instead of only matching perfect pairs, it checks how similar the values are and links them if they meet your chosen similarity level.
You get to decide how strict the match should be. Fuzzy Merge lets you set a similarity threshold, so you can control if you want only very close matches or if you’re okay with looser connections. You can also ignore case differences or extra spaces, which makes matching even more flexible. If you want, you can run several fuzzy merges to catch any records that slipped through the first time.
This approach isn’t just for simple lists. Researchers and businesses use fuzzy matching to handle complex and messy data. For example, in the world of bioinformatics, scientists use fuzzy similarity-based clustering to merge huge biological datasets. This method helps them find patterns and reduce errors caused by noisy or incomplete data. Companies also rely on fuzzy matching in their data quality tools, showing how valuable and widely adopted this technique has become.
When to Use Fuzzy Merge
You should use Fuzzy Merge when your data has inconsistencies that make exact matching impossible. Here are some common situations:
You have names or addresses with typos, missing letters, or different spellings.
Data comes from multiple sources, and each source uses its own format.
You want to combine lists where some entries are almost the same but not quite.
Fuzzy Merge shines when you deal with big, messy datasets. Modern data often comes with errors, missing pieces, or different formats. Fuzzy techniques help you manage this uncertainty. They work well with large volumes of data and can handle information that changes quickly. You can even adjust how strict the matching is, so you get the results you need without spending hours cleaning up every detail.
Tip: Before you start, check your data quality. Clean up obvious errors, trim spaces, and standardize cases. This makes Fuzzy Merge work even better.
Fuzzy Merge Steps
Enable Fuzzy Merge
You can start a Fuzzy Merge right inside Power Query, Power BI Desktop, or data flows. The process feels familiar if you’ve ever done a regular merge, but with a powerful twist. Here’s how you get things rolling:
Open Power Query Editor and load your tables.
Go to the Home tab and select Merge Queries or Merge Queries as New.
In the merge dialog, pick your main table (like a list of employees) and your reference table (like a department list).
Select the columns you want to match on—usually text fields like names or departments.
Check the box labeled Use fuzzy matching to perform the merge.
Tip: Fuzzy Merge works best when you start with a normal merge first. If you see lots of unmatched records, that’s your cue to try fuzzy matching.
Performance matters, especially with big datasets. Fuzzy Merge compares every value in your main table to every value in your reference table. This can take longer than a regular merge, so keep an eye on response time and accuracy. If you work with huge data, consider optimizing your queries or limiting the number of matches per row.
Select Tables and Columns
Choosing the right tables and columns is key to a successful Fuzzy Merge. You want to link records that logically belong together, even if the text isn’t identical. For example, you might have two lists of customer names from different sources. One list spells a name as “Jonathon,” while the other uses “Jonathan.” Fuzzy Merge helps you connect these.
Before you merge, take a few minutes to clean your data:
Trim extra spaces.
Standardize capitalization (make everything uppercase or lowercase).
Fix obvious typos or special characters.
Check for nickname variations or abbreviations.
Note: Pre-cleaning your data boosts match accuracy and reduces false positives. Even a quick cleanup can make a big difference.
When you select columns, focus on those that logically link your tables. For example, if you’re merging baseball datasets, use player names as the key. If you’re joining survey results with a dictionary table, use the answer text. Understanding your data helps you pick the best columns for matching.
Apply Fuzzy Matching
Now you’re ready to apply Fuzzy Merge and see the magic happen. Here’s a step-by-step guide:
In the merge dialog, after enabling fuzzy matching, click on Fuzzy matching options.
Adjust the similarity threshold (from 0.00 to 1.00). A higher value means stricter matching. The default is usually 0.80.
Choose extra options:
Ignore case: Match “SALES” with “sales.”
Ignore spaces: Match “Micro soft” with “Microsoft.”
Maximum number of matches: Limit how many matches you get per row.
Transformation table: Use a custom mapping table if you have tricky cases (like mapping “apls” to “Apple”).
Confirm the merge. Power Query adds a new column to your table with the matched results.
Expand the merged column to see which records matched. You can also show similarity scores to check how close each match was.
Review the results. If some values didn’t match as expected, tweak the similarity threshold or update your transformation table.
Here’s a quick example:
You have a table with department names: "Sales", "Managmnt", "HR".
Your reference table has: "Sales", "Management", "Human Resources".
A normal merge only matches "Sales".
Fuzzy Merge, with a threshold of 0.8, matches "Managmnt" to "Management" too.
Fuzzy Merge shines when your data has typos, abbreviations, or inconsistent formats. It finds connections that a normal merge would miss. For instance, it can link “IBM” with “I.B.M.” or “International Business Machines,” and even handle different date formats.
Pro Tip: After merging, check the number of distinct values before and after. You’ll often see a big reduction, showing how Fuzzy Merge helps consolidate messy data.
If you want even more control, you can use advanced Power Query functions like Table.FuzzyJoin
or Table.FuzzyNestedJoin
. These let you fine-tune parameters for complex scenarios.
Similarity Threshold
How the Threshold Works
When you use fuzzy matching, the similarity threshold acts like a gatekeeper. It decides how close two values need to be before they count as a match. You set this threshold as a number between 0.00 and 1.00. A value of 1.00 means only exact matches get through. A value closer to 0.00 lets in even the loosest matches.
Think of it like grading a test. If you want only perfect scores, you set the bar at 100%. If you’re okay with a few mistakes, you might accept 80% or even 70%. The similarity threshold works the same way for your data.
Behind the scenes, Power Query uses the Jaccard index algorithm. This algorithm compares the pieces (like letters or words) in each value and calculates how much they overlap. The more overlap, the higher the similarity score. If the score meets or beats your threshold, the records match.
Note: Even if you set the threshold to 0.00, the algorithm won’t match everything. It still looks for some level of similarity, so you won’t get random or unrelated matches.
Here’s a quick table to show how the threshold affects matching:
Adjusting the Threshold
You can fine-tune the similarity threshold to get the results you want. If you set a high threshold, you’ll only match records that are almost identical. This reduces the chance of false matches, but you might miss some true matches if the data has lots of errors or variations. Lowering the threshold makes the matching more forgiving, so you catch more possible matches, but you might also get some that don’t really belong together.
Let’s look at what happens when you adjust the threshold:
High Threshold (e.g., 0.95):
You only match records that are nearly the same. This is great for clean data or when you want to avoid mistakes. For example, “Jonathan” and “Jonathon” might not match if the threshold is too high.Medium Threshold (e.g., 0.80):
You allow for some differences. Typos, missing letters, or small changes get matched. This setting works well for most real-world data.Low Threshold (e.g., 0.60):
You match lots of variations. This can help when your data is really messy, but you might get unrelated matches, like “Jon” matching with “Joan.”
Tip: Start with a medium threshold and adjust up or down based on your results. If you see too many missed matches, lower the threshold. If you get too many wrong matches, raise it.
The choice of threshold also affects how fast your matching runs. Higher thresholds mean fewer comparisons, so things move faster. Lower thresholds make the system check more possibilities, which can slow things down, especially with large datasets. Organizations often balance accuracy and speed by testing different thresholds and seeing what works best for their data.
Here’s a table showing how accuracy changes with different thresholds, based on real-world datasets:
You don’t have to guess the best threshold. Some systems use data analysis to set the threshold automatically, which often gives better results than picking a number at random. For example, in one case, a fixed threshold gave an F1 score of 0.53, but an automatically chosen threshold boosted it to 0.89. This shows that adjusting the threshold can make a big difference in matching quality.
In content-based search systems, using a dynamic threshold improved search results and matched more relevant items.
Different searches or datasets may need different thresholds for the best results.
Automated methods, like the silhouette coefficient, help set the right threshold without trial and error.
Remember: There’s no one-size-fits-all threshold. Try different values, check your results, and pick the one that gives you the best balance of matches and accuracy.
Fuzzy Merge Options
Ignore Case and Spaces
When you work with real-world data, you often see names or words written in different ways. Sometimes, people type in all caps, use lowercase, or add extra spaces by mistake. Fuzzy Merge gives you options to handle these differences easily. You can tell Power Query to ignore case, so "SALES" and "sales" match up. You can also ignore spaces, which helps when someone types "HumanResources" instead of "Human Resources."
Here’s how these settings help you:
The ignore case option lets you match words no matter how they are capitalized.
Ignoring spaces means you don’t have to worry about extra spaces or tabs.
You can control if you want to match whole lines or just parts, which helps with tricky data.
Tip: Turning on these options makes your matches more consistent and saves you from cleaning up every little typo.
Maximum Number of Matches
Sometimes, a single value in your table could match several values in the other table, especially if your similarity threshold is low. The maximum number of matches setting lets you limit how many matches you get for each row. This keeps your results tidy and helps your merge run faster. If you set this to 1, you only get the best match for each value. If you set it higher, you can see more possible matches, but your table might get bigger and take longer to process.
Use a lower number for cleaner results.
Try a higher number if you want to review all possible matches.
Note: For large datasets, limiting the number of matches can really speed things up.
Transformation Table
A transformation table is like a cheat sheet for tricky matches. Sometimes, you know that "Mgmt" should match "Management" or "Acct" should match "Accounting." You can create a small table with these pairs and use it during your Fuzzy Merge. Power Query will use your table to fix these special cases before it tries to match the rest. This gives you more control and helps you catch matches that even fuzzy logic might miss.
Build a transformation table with common abbreviations or misspellings.
Use it to improve match accuracy for your most important fields.
If you have a lot of custom rules, a transformation table can save you hours of manual work.
Troubleshooting
Missed Matches
Sometimes, you might notice that some records you expected to match just don’t show up in your results. This can happen if your similarity threshold is set too high or if your data has extra spaces, different cases, or small typos. Try lowering the threshold a bit and see if more matches appear. Also, double-check your data for hidden spaces or inconsistent formatting. Cleaning up your data—like trimming spaces and making everything lowercase—can help catch more matches. If you still miss matches, consider using a transformation table to handle tricky abbreviations or common misspellings.
Tip: Always review unmatched records after your first run. You might spot patterns that help you adjust your settings for better results.
False Positives
False positives happen when the tool matches records that really shouldn’t go together. This is common when your threshold is too low or your data has lots of similar entries. In some industries, like compliance or sanctions screening, false positive rates can reach 95% or more. These high rates often come from similar names, spelling mistakes, or poor data quality. To reduce false positives, try these steps:
Clean and standardize your data before merging.
Adjust your similarity threshold upward for stricter matches.
Limit the maximum number of matches per row.
Use a transformation table for known tricky cases.
Review matches manually if possible, especially for important data.
Keeping your data clean and your settings tuned helps you avoid hours of manual review.
Best Practices
You can get the best results by following a few simple habits. Start with a normal merge to catch exact matches. Clean your data by removing extra spaces, fixing typos, and standardizing cases. Experiment with different thresholds and options until you find what works for your data. Studies show that using uncertainty measures and fuzzy logic-assisted processing can reduce errors and improve accuracy. These methods help separate useful data from noise, making your results more reliable and saving you time.
Try different settings, review your results, and keep your data tidy. You’ll see better matches and fewer errors every time.
You now have the tools to use Fuzzy Merge for smarter data matching. Start by cleaning your data, then set up your merge and adjust the similarity threshold until you get the best results. Try out advanced options to handle tricky cases. Fuzzy Merge helps you fix duplicates, standardize values, and link records, which means your data becomes more accurate and useful. Keep experimenting and refining your approach. If you want to learn more, check out Power Query’s official documentation or explore community tutorials.
FAQ
How do you know if Fuzzy Merge matched the right records?
You can check the merged column in your results. Expand it to see which records matched. If something looks off, adjust your similarity threshold or review your data for errors.
Can you use Fuzzy Merge with numbers or dates?
Fuzzy Merge works best with text fields. If you want to match numbers or dates, convert them to text first. This helps Power Query compare them using fuzzy logic.
What should you do if Fuzzy Merge is slow?
Large datasets can slow things down. Try limiting the maximum number of matches. Clean your data before merging. You can also filter your tables to reduce the number of rows.
Tip: Start with a small sample to test your settings before running Fuzzy Merge on the full dataset.
Is it possible to undo a Fuzzy Merge?
Yes! In Power Query, you can remove or edit any step in the Applied Steps pane. Just click the "X" next to the merge step, and your data goes back to how it was before.