Identifying the Right Sources of Master Data
Among several challenges faced when we kick start an MDM implementation is the step to determine which source to consider for initial phase of deployment. Amidst all crucial aspects such as data collection, data transformation, normalization, standardization, matching etc, this step of source identification is critical factor for realizing MDM benefits early on.
The proven process to implement MDM is to start with small set of data sources and grow incrementally. Once we identify the sources having correct entities, dependent domains and attributes, we can do an effective ground work for
[icon_list style=”arrow-2″]
- Creating broad set of rules to cleanse the data
- Building standardization engines applicable to all relevant data entities and
- Constructing rules to identify suspects so as to create single version of truth (As discussed in my earlier post)
[/icon_list]
Getting things straight at the beginning is critical aspect of the MDM project as it acts as a foundation for future source system integration plans. This also allows us to accomplish easier enterprise wide MDM roll out by adding additional sources of data to MDM hub.
So, the question is how to choose the sources which will get into MDM during this inaugural phase considering the organizations will have huge application landscape and will not know which systems are responsible for which master data. This is also a very revealing act for many of customer representatives themselves when they find dozens of databases containing data which they did not know existed.
Depending on the master domain you are implementing you would usually start by listing down the most trusted data sources the company currently uses for its customer facing applications. So, for example if you are implementing customer master, you will ask, which system currently manages customer name, their current address and contact information? It’s easier said than done though as you will find the organization indeed has multiple silo applications all having this information for a specific line of business. Each division, department and business process has customer information which is complete as per the corresponding business owners.
One of the strong belief in our MDM arena is, larger the data, larger the data quality issues and even larger are the duplicate records. Put in a nut shell, we would usually choose the data sources which own maximum number of customer records. This gives us an option to set up rules as accurate and generic as possible so a wider set of data issues can be addressed upfront.
[pullquote_right]
Using data profiling tools is a great way of scanning data for missing values, incorrect values and elements violating business rules.
[/pullquote_right]
Also, remember that you’ll need as much information as possible to do an adequate data matching. So emphasis on completeness of these attributes and the source you choose should have these attributes densely filled. To help you discover more about the source data, you will need a quick initial profiling phase to take certain decisions. Data profiling tools help in scanning data for missing values, incorrect values and elements violating business rules. This will allow you to make better effort estimation for clean up work required. Profiling will also help you to carefully weigh each source and judge whether it is reliable source of master data.
How do you analyze the data? And how do you determine the correct sources of master data? Please share your experience and opinions via comments. Thank you.
COMMENTS
Leave A Comment
RECENT POSTS
Composable Applications Explained: What They Are and Why They Matter
Composable applications are customized solutions created using modular services as the building blocks. Like how...
Is ChatGPT a Preview to the Future of Astounding AI Innovations?
By now, you’ve probably heard about ChatGPT. If you haven’t kept up all the latest...
How MDM Can Help Find Jobs, Provide Better Care, and Deliver Unique Shopping Experiences
Industrial data is doubling roughly every two years. In 2021, industries created, captured, copied, and...
Prash, selecting the sources is indeed very essential.
One aspect I have been working with a lot is how to involve external sources as well.
In the customer data arena this will be things as address directories (as we also discussed earlier here on the blog related to geocoding), business directories for B2B data and consumer/citizen directories for B2C very much dependent on the countries and industry in question.
These sources may be very helpful within standardization and data matching and including touch sources in future data entry will have a great impact on data quality if you are able to include this into your business processes.
Hey Prashant,
Selecting the best source of data to start with is a recurring challenge for many of our customers. Henrik brings up a great point that some external sources can provide an initial source of truth like Dun and Bradstreet with B2B data for example. The bigger challenge is that regardless of the accuracy of the external source, that data will still need to match up with the potential mess you’ll be dealing with internally. Ex. How easily can you link helpIT systems with helpIT, helpIT inc, helpIT systems inc, or HSI?
I mentioned in a blog post I wrote about the Retail Single Customer View (http://www.helpit.com/cleandata/?p=138) that a customer of ours selected their website data as a starting point due to the fact that the customers would care most about receiving an order so it is in their own best interest to provide accurate information or the customer will have to deal with logistics headaches down the road.
It’s even possible with some software, including helpIT’s applications, to score the quality of the information within each record based on completeness and accuracy.
So may be answer is a combination of identifying the source that cares most about accuracy of information with a record quality scoring methodology.
[…] Identifying the Right Source of Master Data: Our own @MDMGeek talks about one of the crucial first steps of any MDM implementation: determining which sources of master data to include. […]
Hi Prashant
Thanks for the post. However, I have several very serious concerns about the overall approach you are advocating.
The first of these is that you at no point mention the MOST CRITICAL element for Master Data Definition and Management, which is the LOGICAL DATA MODEL (LDM). If you have not got this you cannot be said to me managing you master data. In fact, it would be impossible. It woul be like claiming that you could manage the electrics in a large building without having a wiring diagram.
Secondly, Master Data Elements must be DEFINED by senior management, they cannot be inferred from existing data. What an enterprise is currently categorising and grouping its data as may be right or it may be very wrong. What it OUGHT to be cannot be inferred from the data itself. It must be defined and this definition will be shown in the LDM.
Thirdly, normalising existing data is a laborious, archaic and error prone activity that should be avoided at all costs. This is a thoroughly outdated excercise called Relational Data Analysis (RDA), that I used lecture on 20 years ago, that has been totally superseded by the Relational Data Model.
If those practising Master Data Management within an enterprise are to be taken seriously then they must be seen to operating at the highest level of quality, using all of the very best techniques. They cannot be seen as a center of excellence if they are leaving out vital elements, such as the LDM, and using a flawed techniques such as RDA.
Regards
John
Hi Prashant
Thanks for the feedback and the context.
I agree that once the the LDM is in place that you can cross check with existing data to see if you have missed any.
However, I would strongly suggest that you always normalise in the LDM and then map all of your existing data onto that.
A properly drawn LDM will be fully normalised to 5NF.
Once again, thanks for the feedback.
Kind regards
John
Interesting – thanks.
A good post about the importance of a single customer view on this site.
Thanks again,
Tom
Google…
[…]The facts talked about inside the article are some of the most effective readily available […]…
[…] recent article examines best practices for identifying the right sources for master data. It begins, “Among several challenges faced when we kick start an MDM implementation is the step […]
[…] Many times we have to custom fit the solution to meet specific organization’s requirements. Identifying different master data elements and modeling them in an efficient manner is one such key aspect. I see lot of organizations […]
I’m really enjoying the theme/design of your blog. Do you ever run into any web browser compatibility problems? A couple of my blog readers have complained about my website not operating correctly in Explorer but looks great in Safari. Do you have any advice to help fix this problem?
If some one needs expert view on the topic of running a blog then i propose him/her to visit
this web site, Keep up the fastidious job.
[…] implementations start with identifying the right source of master data and centralizing it. In this process, we also build rules for standardizing and enhancing the […]
[…] an earlier post, I discussed about how to identify the right sources of master data during an MDM implementation. I argued that this step is critical factor for realizing MDM benefits […]
[…] Prashanta Chandramohan (aka the MDM Geek when his party role is blogger) recently blogged about Identifying the Right Sources of Master Data, which made me think that “eenie, meanie, mindie your MDM sources” would make a great counting […]
[…] Prashanta Chandramohan (aka the MDM Geek when his party role is blogger) recently blogged about Identifying the Right Sources of Master Data, which made me think that “eenie, meanie, mindie your MDM sources” would make a great counting […]
[…] an earlier post on this blog, I examined the ways in which we can identify the right sources of Master Data. Once these data sources are identified, next step is to select the right data elements from them, […]
[…] you start your MDM initiative, approach the solution by identify the sources of master data, bring at least 2 sources of data into master data hub and run data matching process. This is an […]