CDMP Fundamentals • 100 Questions • 90 Minutes
← Back to all topics

Master and Reference Data Management

Chapter 10 10% of exam

Overview

Master Data Management (MDM) is the discipline of creating and maintaining a single, authoritative, reliable source of truth for an organization’s critical shared data entities. Master data represents the core business objects that are shared across multiple systems and business processes — such as customers, products, employees, suppliers, accounts, and locations. Without MDM, different systems maintain their own versions of these entities, leading to inconsistencies, duplicates, and conflicting information. Reference Data Management is a closely related discipline that controls the standardized sets of allowed values used to classify and categorize other data — such as country codes, currency codes, status codes, industry classifications, and product categories. Reference data changes slowly and is often sourced from external standards bodies (ISO, ANSI, government agencies). While master data describes 'who' and 'what,' reference data describes 'which type' or 'which category.' The business value of MDM is significant: a unified view of customers enables better service and cross-selling, consistent product data enables supply chain efficiency, standardized reference data ensures regulatory compliance and accurate reporting. The 'golden record' concept — a single, best version of each master data entity created by merging data from multiple sources — is the central deliverable of an MDM program. Success requires strong governance, clear data ownership, matching and merging technology, and ongoing data stewardship.

Key Concepts

Master Data Defined

Master data represents the core business entities that are shared across multiple systems, processes, and departments. Characteristics of master data: (1) NON-TRANSACTIONAL — describes entities, not events (Customer is master data; Sale is transactional data); (2) RELATIVELY STABLE — changes less frequently than transactional data (a customer's name changes rarely; their orders change constantly); (3) SHARED — used by multiple business processes and systems (Customer is referenced by sales, billing, shipping, marketing); (4) CRITICAL — its accuracy directly impacts business operations and decisions. Common master data domains: CUSTOMER (individuals and organizations), PRODUCT (goods and services), EMPLOYEE (workforce), SUPPLIER/VENDOR (business partners), ACCOUNT (financial accounts), LOCATION (addresses, facilities), ASSET (equipment, property). Master data provides the context for understanding transactional data.

Reference Data Defined

Reference data consists of standard sets of valid values used to classify, categorize, or constrain other data. Key characteristics: (1) MORE STABLE than master data — changes very slowly (country codes rarely change); (2) OFTEN EXTERNALLY SOURCED — from standards bodies (ISO country codes, NAICS industry codes, SWIFT bank codes); (3) USED FOR CLASSIFICATION — categorizes and groups other data; (4) HIERARCHICAL — often organized in parent-child hierarchies (continent → country → state → city). Examples: country codes (ISO 3166), currency codes (ISO 4217), language codes (ISO 639), status codes (active/inactive/suspended), product categories, industry classifications (SIC/NAICS), units of measure. Reference data must be governed centrally to prevent proliferation of inconsistent code lists across systems.

Golden Record (Single Version of Truth)

The authoritative, best representation of a master data entity, created by combining information from multiple source systems. The golden record creation process: (1) IDENTIFY — find all records across systems that refer to the same entity; (2) MATCH — determine which records are duplicates using matching rules; (3) MERGE — combine matched records into a single golden record using survivorship rules; (4) SURVIVE — for each attribute, determine which source provides the best value. SURVIVORSHIP RULES determine which value 'wins' when sources conflict: most recent update, most complete record, most trusted source, manual override. Example: Customer John Smith exists in CRM (phone: 555-1234, email: john@gmail.com) and billing (phone: 555-5678, address: 123 Main St). Golden record combines the best attributes from both using survivorship rules.

MDM Architecture Styles

Four primary MDM implementation approaches: (1) REGISTRY STYLE — creates an index/pointer to master data records in source systems WITHOUT consolidating data. Lightweight, non-invasive, provides cross-reference but no golden record. Best for: organizations starting MDM with minimal disruption. (2) CONSOLIDATION STYLE — copies data from sources into a central hub, creates golden records for READ purposes. Sources remain authoritative for data entry/changes. Best for: analytical use cases, reporting, data quality monitoring. (3) COEXISTENCE STYLE — hub and sources SHARE data bidirectionally. Golden records are created in the hub AND synchronized back to sources. Best for: organizations wanting a single view while maintaining source system autonomy. (4) CENTRALIZED (Transaction Hub) STYLE — the MDM hub IS the authoritative system of record for all create/read/update/delete operations. Sources consume master data FROM the hub. Best for: new implementations, organizations with strong governance. Each style represents increasing investment and control.

Data Matching (Record Linkage)

The process of identifying records that refer to the same real-world entity. Two primary approaches: DETERMINISTIC MATCHING — uses exact rules: 'If SSN matches exactly, it's the same person.' Advantages: precise, predictable, fast. Disadvantages: brittle (fails on typos, format differences). PROBABILISTIC MATCHING — uses statistical algorithms to calculate a match probability based on multiple attributes: 'Name 90% similar + same zip code + birth year ±1 = 95% probability of match.' Advantages: handles imperfect data, finds matches that exact rules miss. Disadvantages: requires tuning, can produce false positives/negatives. Most MDM implementations use a HYBRID approach: deterministic rules for high-confidence matches, probabilistic scoring for ambiguous cases, manual review for edge cases. Key matching challenges: name variations (Bob/Robert), address formats, missing data, data entry errors.

Hierarchy Management

Managing parent-child and other hierarchical relationships between master data entities. Types of hierarchies: (1) ORGANIZATIONAL — company → division → department → team; (2) GEOGRAPHIC — region → country → state → city; (3) PRODUCT — category → subcategory → brand → SKU; (4) ACCOUNT — parent company → subsidiary → business unit → cost center; (5) CUSTOM — any business-defined grouping. Hierarchy management challenges: multiple hierarchies for the same entity (a product belongs to different hierarchies for marketing vs. manufacturing), time-varying hierarchies (organizational restructuring), cross-hierarchy relationships. MDM systems must support multiple concurrent hierarchy versions and effective dating.

Data Sharing Agreements

Formal agreements between business units, departments, or organizations about how master and reference data is shared, maintained, and governed. Agreements should define: which data is shared, who is the authoritative source (system of record), quality expectations and SLAs, access permissions and security requirements, update frequency and synchronization methods, dispute resolution processes, and data lifecycle management responsibilities. Critical for federated organizations where different business units maintain their own systems but need consistent master data.

MDM Data Quality

Master data has unique quality challenges: (1) DUPLICATE MANAGEMENT — identifying and resolving duplicate records across systems is the central MDM quality concern; (2) IDENTITY RESOLUTION — determining whether two similar records represent the same entity or different entities; (3) DATA DECAY — master data becomes stale over time (people move, companies merge, products are discontinued); (4) INCONSISTENCY — different systems store different values for the same attribute; (5) INCOMPLETE RECORDS — source systems capture different attributes. MDM quality processes: ongoing matching and deduplication, data stewardship workflow for manual review, automated quality monitoring, regular refresh from authoritative sources, and exception handling processes.

MDM Governance

Governance is critical for MDM success. Key elements: (1) DATA OWNERSHIP — each master data domain needs a clear business owner who makes decisions; (2) STEWARDSHIP — data stewards manage day-to-day data quality, resolve duplicates, and handle exceptions; (3) POLICIES — rules for data creation, update, sharing, and retirement; (4) WORKFLOW — formal processes for creating new master records, approving changes, handling merge/split requests; (5) METRICS — KPIs for match rates, duplicate rates, data quality scores, resolution times; (6) CHANGE MANAGEMENT — processes for evolving master data models, adding new domains, and updating matching rules. Without governance, MDM becomes just another data silo.

MDM and Customer 360

Customer 360 (or Single Customer View) is the most common MDM use case. It creates a complete, unified profile of each customer by combining data from: CRM, billing, e-commerce, marketing automation, customer service, social media, and third-party data. Benefits: personalized customer experiences, reduced duplicate communications, accurate customer lifetime value calculation, better cross-selling and upselling, consistent customer service across channels. Challenges: matching across systems with different identifiers, handling household vs. individual relationships, maintaining privacy compliance (GDPR consent management), and real-time vs. batch synchronization.

Reference Data Governance

Managing reference data requires specific governance practices: (1) CENTRALIZED MANAGEMENT — reference data should be managed in a single, authoritative repository; (2) EXTERNAL STANDARDS — use standard code sets (ISO, ANSI) whenever available rather than creating proprietary ones; (3) VERSION CONTROL — maintain history of changes to code sets with effective dates; (4) DISTRIBUTION — provide a reliable mechanism (API, database, file) for systems to consume current reference data; (5) MAPPING — maintain mappings between internal codes and external standards; (6) LIFECYCLE — processes for adding new values, deprecating old ones, and handling transitions. A common anti-pattern is each system maintaining its own copy of reference data that drifts out of sync over time.

MDM Implementation Approaches

MDM can be implemented through various approaches: (1) BIG BANG — implement MDM across all domains and systems simultaneously. High risk, high investment, rare in practice. (2) PHASED/ITERATIVE — start with one domain (usually Customer or Product), prove value, then expand to other domains. Recommended approach. (3) TOP-DOWN — start with enterprise data model and governance framework, then implement technology. (4) BOTTOM-UP — start with a specific business problem (e.g., customer deduplication), build momentum, then formalize governance. Success factors: strong executive sponsorship, clear business case, governance before technology, quality data from the start, and realistic expectations about timeline and effort.

Best Practices

  • Start MDM with a single domain (usually Customer or Product) before expanding — prove value incrementally
  • Define clear data ownership for each master data domain with a named business executive accountable
  • Establish survivorship rules BEFORE merging records — document which source wins for each attribute
  • Use reference data standards from authoritative external sources (ISO, ANSI) rather than creating proprietary codes
  • Implement data matching and deduplication as ONGOING processes, not one-time cleanups
  • Choose MDM architecture style based on organizational maturity and business requirements — start with registry or consolidation
  • Maintain reference data in a centralized, governed repository with distribution mechanisms (APIs)
  • Establish change management processes for master and reference data with formal approval workflows
  • Monitor data quality metrics specifically for master data — duplicate rates, match accuracy, data freshness
  • Version reference data sets and maintain change history with effective dates
  • Implement data stewardship workflows for handling exceptions, merge/split requests, and quality issues
  • Build the business case for MDM using specific examples of how duplicates and inconsistencies impact the business

💡 Exam Tips

  • MDM is 10% of the exam — expect 10 questions
  • Know the FOUR MDM architecture styles: Registry, Consolidation, Coexistence, Centralized (Transaction Hub) — and when to use each
  • Master data (entities like Customer, Product) vs Reference data (code sets like Country Code) vs Transaction data (events like Orders) — know the differences
  • Golden Record concept is FREQUENTLY tested — understand survivorship rules and the merge process
  • Matching techniques: Deterministic (exact rules) vs Probabilistic (statistical similarity) — know both
  • Reference data is often sourced from external standards (ISO codes) and is MORE STABLE than master data
  • MDM is about creating a single version of truth across the enterprise
  • Hierarchy management is a key MDM capability — organizations, products, geography all have hierarchies
  • Start MDM with one domain and expand — the phased approach is recommended
  • Data stewardship and governance are CRITICAL for MDM success — technology alone is insufficient