MDM Matching – Are You Asking the Right Question?

Blog

MDM Matching – Are You Asking the Right Question?

An odd request came in last week when a prospective customer asked us about a benchmark on the percentage of duplicates we can find for them using MDM.

In this blog, I wanted to touch base on few key reasons why this is odd in many ways. I would also like to take this chance to explain what are the right questions you should be asking to your vendor when it comes to MDM matching.

I have worked with dozens of customers directly in last 12 years. In my current role, I talk to companies implementing master data management, the practitioners and the thought leaders on a day-to-day basis. I can confidently say, every customer requirements around mastering are unique.

When it comes to identifying duplication of data in your organization, the discussion quickly changes to a customer’s specific requirements. The usage of the data (ex: analytics for marketing segmentation, real-time access to trusted data across the company, etc.), the number of sources and target systems, the quality of data in those sources are all different even for organizations within the same industry. Often the business requirements dictate what you need to do with data and there are instances such as legal and compliance when the requirements suggest certain duplicates must survive.

On top of this, there are project timelines, certain trade-offs the customer makes to achieve the level of accuracy, performance, and quality of the data. Think of adjusting several knobs on your stereo to get the best sound which YOU like.

Result? The percentage of duplication we can find using an MDM tool varies from customer to customer. It depends on several parameters, and you need to find what is right for you. Your tolerance level for false positives and false negatives dictates your configuration.

The real question you should be asking your vendor is –

  • How sophisticated is your matching engine?
  • Can it support probabilistic (fuzzy) and deterministic (exact) matching styles?
  • Is it easy to configure the matching engine? Is it easy to understand for my data stewards, IT and business users, so they are all on the same page?
  • How easy or hard it is for us to change tolerance level for missed and false matches?
  • Does the matching engine consider phonetic spellings, partial fields, the statistical distribution of records and more?
  • Does the vendor tool allow fine-grained tuning of the ranges to search, tightness of match and other parameters for balancing the degree of matches amount of processing (performance)?
  • How is a data set with international names and addresses handled?
  • Can the matching engine learn from past behavior from stewards and self-correct?
  • What about data survivorship? Does the vendor provide easy ways for us to configure survivorship rules?
  • Does the vendor take a configuration over coding approach for both matching and survivorship?
  • Can you provide attribute level survivorship?
  • Does the vendor offer scalability for matching large data sets and performing multiple matches with different criteria?
  • Can you do fast searches that leverage matching in real-time? Is the matching engine designed to handle bulk, near real-time and real-time modes?

These are only a few of the questions that come to my mind. A thorough analysis of this can help you use the best solution in the market. A correct decision here can save you months of person hours in the form of manual stewardship.

Back to the original question, the answer depends on what you are trying to achieve.

I would love to hear your views. Please leave your response in the comments section or reach out to me at @mdmgeek on Twitter.

Image courtesy of freedigitalphotos.net

COMMENTS

11 Thoughts on MDM Matching – Are You Asking the Right Question?
    Henrik Liliendahl
    17 Apr 2017
     11:21pm

    A very good set of questions Prash. My addition will be:
    • If it is worth having manual inspection of automated matching, how well is the user interface for doing that? If more than one person will do this, can you do check-out and check-in?
    • Apart from merging duplicates, can you also split a record that represents two real world entities? Can you mark two records as not a duplicate, but a part of the same hierarchy (household, company family tree)?
    • Can you match using multiple, perhaps historical, sub entities (addresses, phones, emails…)?

    0
    0
    Denis Toporov
    18 Apr 2017
     3:10pm

    I talk to customers and partners in MDM space on matching daily so thought I can to add to this

    * Please, please, please differentiate between search and match – this is biggest point of confusion

    Search – if you need to find records that is similar to your input but doesn’t necessary represent same entity e.g. customer or product

    Match – if its required to understand if several records actually represent same entity e.g. slight variations of the same customer OR product

    * Set performance expectations

    It’s all about defining and finding compromise between acceptable execution time and quality of results

    * Review data quality, biases and patterns – your matching should understand domain , attributes and qualities of the data set e.g. entity name matching will be different from industry to industry

    0
    0
    Gajanan
    18 Apr 2017
     8:37pm

    It’s good and enough to understand the real focus while dealing with matches. Thank you!

    0
    0
    Maruthi Kumar Gajavalli
    19 Apr 2017
     4:32pm

    Can the tool or Platform give an automated Match Score based on the match rules it matched with for each of the results based on pre-configured score for each Match Rule.
    Also stressing more on Performance. Can the Match function scale up to low latency requirements even when there are Millions of records to help in real time integration of consolidation to proactively avoid duplicates in the system, if necessary based on Big Data solution or an Index.

    0
    0
    Sonal Goyal
    20 Apr 2017
     12:41am

    Great article Prash. I would like to suggest the following points too –
    a. time to set up
    b. cost
    c. Ability to handle different domains
    d. Level of data pre processing and cleansing needed for acceptable match quality

    0
    0
    Pradeep yallapragada
    22 Apr 2017
     7:18am

    This is quite interesting and informative !Thanks for sharing … couple of comments/ questions?

    Nick names / alias names handling?

    Party type differentiation between entity and person for a typical / common org key words? How extebtvtge matching engine support? Any commanlity can we bring?

    0
    0
    michael caulfield
    24 Apr 2017
     10:32am

    Touched on lightly, but fundamentally MDM is about the business motions, tools are an enabler. The question on# of duplicates, and details of matching will be highly dependent on what the primary purpose/channel is.
    Examples:
    -Matching for marketing will have different implications than for transaction flows.
    -A company with direct distribution has the facility for greater continuity prior to implementation of de-duplication flows, thus lower de-dupe results.
    -Multi layer distribution, and fractured sales and customer life cycles in an OM capture will have greater need for sophisticated de-duplication, and though there will be greater complexity to handle, the benefits will be higher if measuring # of dupes captured.
    A very important question to understand well will also be what is the tolerance of over matching, as this will be a factor which forces the equation towards allowing more records with ambiguous match results.
    Overall, agree with the article, and the fundamental is understand the business problem, and desired outcomes. Tool Selection and tailoring will need to keep these in mind.

    0
    0
    Ramesh Kalava
    27 Apr 2017
     7:30am

    Great Article Prasanth, few more points ,

    – Can we expose match APIs to external applications ? where we can stop creating the duplicate accounts ?
    – Key Performance indicators for match rules for continuous match tuning. most of the business users wants to see match rule performance & changes.

    0
    0

Leave A Comment

RECENT POSTS

Businex-Blog

Composable Applications Explained: What They Are and Why They Matter

Composable applications are customized solutions created using modular services as the building blocks. Like how...

Businex-Blog

Is ChatGPT a Preview to the Future of Astounding AI Innovations?

By now, you’ve probably heard about ChatGPT. If you haven’t kept up all the latest...

Businex-Blog

How MDM Can Help Find Jobs, Provide Better Care, and Deliver Unique Shopping Experiences

Industrial data is doubling roughly every two years. In 2021, industries created, captured, copied, and...