There are multiple challenges that data professionals face when dealing with the creation and maintenance of master data entities. From the selection of a matching strategy to the setup of a consolidation approach, the decisions made along the way have long-lasting effects on an organization's commercial agility. To achieve clean and low maintenance data results, source-supplied profiles need to be matched and merged in a way that limits manual intervention and creates data quality through properly consolidated master profiles. There are typically two types of matching employed: Deterministic and Probabilistic. Although linking technology has improved greatly in recent years, many firms rely on these tried and true methods. Yet, they often conflict with each other because the very rules created to determine who a unique customer is must be broken repeatedly by exception cases to address overmatching.
The Basics
At its most elementary, a match rule specifies a logical statement under which two profiles or records can be automatically linked. Multiple match rules and match rule sets can be created to accommodate options such as null/null and null/not null attribute matching to address specific known scenarios.
The matching methods used can be set based on the level of confidence in them. The most confident (deterministic) matches can be automated (if the rule's conditions are met, then accept the match automatically) while more uncertain (probabilistic) ones may require a review process for a time until a history of trust or a predictive model is developed to allow them to automatically match.
Types of Matching
There are a variety of matching types. The most used types are exact and fuzzy with some variations.
Exact match relies on being able to link two profiles based on a set of match attributes that match exactly to each other (after standardization).
Negative matching is used to identify exclusion cases where two profiles should not be matched. This approach can be employed to break matches that occur in exact or fuzzy matching or to refer them to data steward review.
Fuzzy match is typically any type of matching that is not exact. In fuzzy matching, nearness, variations, and proximity are scored to allow matches above set thresholds. These are probabilistic matches that have a high enough score that indicates a high likelihood that the two profiles being compared are indeed a match.
Phonetic match is a type of fuzzy matching that is employed when the input comes from spoken, recorded, or transcribed data capture. In these cases, the matching algorithm needs to be able to handle similar-sounding names and words. Since these algorithms are usually language-specific, they tend to be unreliable for automated matching and must be reviewed by data stewards to make the match.
Propensity matching, another type of fuzzy matching, relies on a prediction model to produce a score that indicates a valid match and can be employed in a machine learning (ML) approach.
The bottom line is that a matching approach should be designed for the specifics of the entity type (i.e., people, organizations), data volume, and data variety under consideration. An example of this is how what works for healthcare professionals may be unsuitable for healthcare organizations. Moreover, what works for doctors (a subset of healthcare professionals) may not work for nurses (another subset of healthcare professionals), particularly when it comes to exception rules.
Once matching has occurred, profiles can be merged or consolidated into a golden record or golden profile designed to provide a single version of truth of a customer.
The next part of managing the razor's edge is merging/consolidation.
Comments