In the last post, we covered the standardization process that increases the quality and usability of an organization’s mission-critical data. Standardization is particularly important to automated matching because it is more efficient and effective to match standardized data versus unstandardized data.
In an MDM process, matching typically includes a reference data set or an authoritative source - a data set that is considered the standard to which all other data elements should eventually be linked. For example, OpenPaymentsData.CMS.gov has reference data sets for physicians, teaching hospitals, and companies making payments. The more data are submitted according to the CMS standards, the more likely the data returned or matched will be accurate.
In the matching process, there is a critical MDM calibration activity that takes place: managing the Razor’s Edge between over- and under-matching profiles in the enterprise customer MDM environment. The reason it is a calibration exercise is that it is an informed decision to select specific levels, tolerances, and thresholds based on your insight about the volume, veracity, variety, and inherent value of your data sources and your knowledge about the requirements of your data consumers. Organizations must leverage their understanding of, and tolerance for, risk and determine how much they want to allow automation to match and merge data. Match too much and data stewards must break links between existing customer entities and newly submitted entities. Match too little to your customer MDM instance and data stewards must toil through large work queues to manually link near matches to existing customer entities. In addition, they will have to decide when to create new customer entities through splits and broken links and when to remediate customer profiles to submit to automation again. This is costly and time-consuming, leading to unnecessary hours of data recycling and the problem of data “not sticking”— valuable updates of your customer data assets that you end up overwriting after reprocessing. This is akin to throwing a great block for your running back, only to tackle your own running back yourself.
In the next post, we will review the types of matching and how they affect enterprise data accuracy and availability.
Comments