top of page

Managing The Razor’s Edge: challenges with customer Master Data - Part 3: Ingestion

  • Writer: Darryl D Williams
    Darryl D Williams
  • Jul 1, 2020
  • 3 min read

In the previous post, we acknowledged that lengthy MDM cycles create a disconnect within the enterprise when it comes to accurately and completely identifying its customers. The reason is that multiple serial steps necessary to integrate customer data from external sources into an authoritative MDM hub. Twenty years ago, it was common for companies to batch process customer MDM as often as monthly or biweekly. There were a few companies that were so bold as to process customer data as frequently as weekly. Today, there are some companies that still process on a monthly or biweekly basis with many more processing on a weekly or daily basis. The main reasons for these shifts in business behavior and requirements are:

  • The RDBMS technology available required days or weeks to process customer MDM data.

  • Updates to authoritative data were only available on a biweekly or monthly basis.

  • Activity (transactional fact data) could not be reported on or analyzed with weekly or daily changes – this was a moving target that many functional areas found to be useless for comparisons. For example, CRM activity records might contain calls to physicians who are not yet in the company's customer galaxy, making it impossible to view both call and prescription data because the customer golden profile has not yet been created.

As we examine the necessary steps of processing customer master data from sources, it will become clear why this is so challenging.

First, consider that multiple sources mean multiple forms, structures, and quality of incoming customer data. These sources are typically expense management, clinical trials management, customer relationship management, enterprise resource planning, etc. Each has its own requirement for collecting customer data and, as a result, the rules governing how that data are collected may be managed at a system, function, or even a department or group level. Since each data element (i.e. first name, middle name, last name, business name) has its own level of importance to a particular system, function or process, a central customer master data system may not be able to curate the locally governed data for enterprise purposes. Yet many companies send data in this format into an MDM hub for processing. I will label this the negative automatic dishwasher scenario: adding heavily soiled dishes to a dishwashing cycle with mixed results – some dishes will be sparkling clean and others will be partly clean and in need of rewashing. This is inefficient, ineffective, and renders the dishwashing process unreliable. For companies that manage their customer master data using the negative automatic dishwashing scenario, their MDM processes will have mixed results – some customer profiles will be pristine while others will be partly “clean” and in need of reprocessing or significant manual remediation.

Figure 1: MDM Data Source Preparation for Ingestion.

Second, consider that data quality is a huge challenge in the process. Most systems will follow some form of the land-stage-load approach to preparing data for ingestion as depicted in Figure 1.

  • Land includes receiving the data in whatever form is acceptable to the MDM hub. This could be flat file, API, manual entry, message queue, or other. Once the data have been landed, they must be staged to enter the MDM hub.

  • Stage includes structuring the data in a way that aligns with the specifications of the ingestion engine. This can be JSON, XML, or any number of forms recognized by the ingestion engine. For example, if every profile must include all attributes on one row, then the staging step is where this restructuring/denormalization takes place.

  • Load is the step where the properly structured requests are submitted to the MDM hub for processing.

The main objective of ingestion is to take data potentially from various sources and load them seamlessly into the MDM hub for processing. In the next post, we will look at standardization which is critical regardless of the processing mode (real-time or batch).

Comments


bottom of page