Data Quality Wakes Us Up in the Night
As we wind up Q1 2012, I am very happy, excited and at the same time exhausted with all things this year has bought in so far. I have been working with many customers this year; some of them are just getting started on their MDM journey.
Interestingly, many customers are taking up upgrade projects too as they start realizing importance of business process management (BPM), big data and improved matching capabilities getting offered by the vendors in their new releases.
While I am seeing a lot of excitement and frenzy about the new trends, I see an immense need for sticking to basics when it comes to implementing MDM.
Talking about maturity of organizations while implementing MDM, a lot of things went well this year, but some things didn’t. I wanted to discuss one instance here which emphasizes the importance of a fundamental requirement for MDM program. Yes, you guessed it right, the “Data Quality”. Here is why!
This is what happened few weeks ago at a customer place –
I worked with this client to help design and configure MDM system. The architecture was supported by an ETL solution to consolidate data from two sources of master data for the first phase of the implementation. While we successfully migrated many of the customer master data elements into MDM, we also made sure the solution supported ongoing delta changes flowing into MDM on a daily basis. The long term vision being creation of Master Data Hub, the organization is currently striving to have hybrid architecture so as to do an effective ground work for duplicate record consolidation.
The solution had considerable chunk of automation built in to it, so that there is least amount of human intervention in synchronizing changes in source data with MDM. What happened after about a week was somewhat of disaster. The automation started breaking in the middle of load because of an exception thrown by the data load process. When we got up in the middle of the night to check what (the heck) is going on; after several hours of analysis of logs, automation scripts and the services, we found the complication was due to a special character being sent in the load file. Further investigation showed the specific customer record in question had a non UTF-8 character inserted in an address field. In the absence of controls at point of entry all the way up to MDM, no where in the entire data flow stream this was addressed.
While we fixed the problem by manually intervening this time, we also kept wondering what we will do to fix this so the same thing will not repeat again.
Now what kept me dumbfounded was the prejudice shown by the project team and management. I was displeased to hear – “our work is to get the data as is in source to MDM”. I have to be considerate here though as this team is given a task but is not aware of the importance and all the theory which goes behind MDM.
When you are setting up an operational MDM system, you strive to cleanse the data source and try and clean up the process by which the poor data was created in first place. If this is not done and/or if you are not doing an up front data cleanup before this data hits MDM repository, you sure have not done anything better than the mess you already have. But the hitch I always see is that a team of experts which are part of the project including the management are often not educated about this. Blame it on consultant like us for not doing it, but many a times we get involved at different phases of the project and will only be hired to do a specific task. (Like in this case my role was to configure only the MDM tool).
We all know that one of the leading causes for delays in implementing MDM project is failure to ascertain and fix poor quality data. We do profiling, sampling and data discovery steps to measure and estimate the cleaning required on the source data before the MDM journey is embarked.
There is also an even more important step, making sure the team, the management leading the effort and business owners are all educated about MDM. This has to happen at the very beginning so everyone knows how not to have the “special” data wake us up in the night.
COMMENTS
Leave A Comment
RECENT POSTS
Composable Applications Explained: What They Are and Why They Matter
Composable applications are customized solutions created using modular services as the building blocks. Like how...
Is ChatGPT a Preview to the Future of Astounding AI Innovations?
By now, you’ve probably heard about ChatGPT. If you haven’t kept up all the latest...
How MDM Can Help Find Jobs, Provide Better Care, and Deliver Unique Shopping Experiences
Industrial data is doubling roughly every two years. In 2021, industries created, captured, copied, and...
External data feeds present tricky problems, as I saw in a prior project. In that case, a vendor made a slight change in a data feed, suddenly rendering their imported records invalid as values in a required column went missing.
One benefit though: exception reporting and import failure analysis suddenly became important.
Thanks for the good read.
Great article, Prash. I’ve run into this exact situation before – the dreaded “special character” situation.
Sounds like you were able to help them figure it out – but I’ve never understood why the MDM platforms are so finicky and end up blowing up when they see unexpected data. I hope that the platforms get more robust over time.
Best regards — Dan
Good article, the funny thing is, that these aren’t actually anything to do with MDM, they are Data Integration issues and we’ve been integrating data since time began! However uneducated clients, coupled with lazy Data Integration flows gives you these problems!
I’m not a consultant but have seen this issue as I’ve been at the coal face mainly times in the middle ofte night!!
[…] wrote a blog post while ago about how Data quality wakes us up in the night. I shared a story when bad data forced us to take a trip to office on a cold, wintry […]