In this Razor’s Edge post, I will cover some details around the terminology used in the Part 1 overview and relate them to everyday challenges and approaches to mastering data.
In Part 1, the first challenge mentioned was lengthy MDM cycles. If there is one thing that has changed over the last decade in business data processing, it is that the demand for higher frequency of enterprise data processing has increased, meaning the expectation is that the time between cycles has shrunk so that more cycles must be performed and that data can be refreshed on a more frequent basis.
Take a look at the high-level illustration in figure 1 of a typical MDM processing cycle.
The process begins with data source systems. A data source system represents a feed that provides profiles, in this case customer profiles, to the mastering system. There are also target systems that receive profiles that emanate from the MDM Processing system. Most sources are also targets, but not necessarily. Just because a system provides profiles, it does not have to ingest and integrate profiles back into its system. Likewise, a target system that accepts profiles, does not have to extract and provide profiles into the MDM Processing system as a matter of course. Reference Data Sources represent high trust data that are usually purchased from 3rd parties or provided as authoritative standards for some component of the mastered data. For example, in the US, individual states license medical professionals to practice in their field. There are also individual boards that recognize these professionals as experts in a given discipline. Each may be considered an authoritative source for licensure and specialty and could be a separate reference source in an MDM ecosystem. In practice, thank goodness, some 3rd parties aggregate these data and provide them as value-added identifiers, attributes, and metadata as a package.
The MDM cycle is a complete trip (in batch mode or transactional mode) from start to end in Figure 1. Batch mode is “full load” processing where the typical volume of data from all MDM sources is processed through the MDM Processing System, then published and integrated back into target systems. Transactional mode is processing a single record at a time. So long as you define the steps in transactional mode to be equivalent to batch mode, results should be equivalent. Also, batch mode could be hours or days whereas transactional mode should be merely seconds.
Keep in mind, MDM processing is just one of many data processes that may have to be run in order to refresh enterprise data for day-to-day use. With today’s faster, more powerful storage and compute capabilities using cloud architectures, data processing time has decreased dramatically and provided businesses with opportunities to accelerate transactional and analytic data availability.
In my next post, we will explore the process of preparing data from sources for MDM processing.
Comments