Master Data Architecture

Expert-defined terms from the Certificate in Master Data Migration course at London School of Business and Administration. Free to read, free to share, paired with a professional course.

Download PDF Free · printable · SEO-indexed
Master Data Architecture

Attribute – a single data element that describes a property of an entity,… #

Attribute – a single data element that describes a property of an entity, such as Customer Name or Product Code.

Explanation #

In master data architecture, attributes are the building blocks that capture the characteristics of master entities. They are defined in the data model and stored in the underlying repository.

Example #

The Customer entity may have attributes like First Name, Last Name, Email Address, and Customer Status.

Practical application #

Attributes are mapped during migration to ensure that source system fields align with target system attributes.

Challenges #

Inconsistent naming, differing data types, and missing attributes across source systems can cause mapping errors and data quality issues.

Canonical Data Model (CDM) – a standardized, organization‑wide representa… #

Canonical Data Model (CDM) – a standardized, organization‑wide representation of data used to enable seamless integration between heterogeneous systems.

Explanation #

The CDM acts as a neutral reference that all source and target systems translate to and from, reducing the need for point‑to‑point mappings.

Example #

An e‑commerce firm defines a CDM for product information that includes SKU, Title, Category, and Price. All legacy product databases map their native fields to this CDM before loading into the new system.

Practical application #

During master data migration, the CDM simplifies transformation logic and supports future integrations.

Challenges #

Designing a CDM that accommodates all legacy variations without becoming overly complex, and keeping the CDM synchronized with evolving business requirements.

Data Architecture – the overall structural design of data assets, data fl… #

Data Architecture – the overall structural design of data assets, data flows, standards, and governance mechanisms within an organization.

Explanation #

It defines how data is collected, stored, processed, and accessed, providing a roadmap for migration, integration, and analytics.

Example #

A retail chain’s data architecture may include a data lake for raw transaction logs, an MDM hub for product master data, and a data warehouse for sales reporting.

Practical application #

A clear data architecture guides the sequencing of migration activities, identifies dependencies, and ensures alignment with strategic objectives.

Challenges #

Legacy landscape complexity, siloed data ownership, and rapidly changing technology stacks can impede the creation of a coherent architecture.

Data Governance – the set of policies, processes, roles, and metrics that… #

Data Governance – the set of policies, processes, roles, and metrics that ensure data is managed as a valuable enterprise asset.

Explanation #

Governance establishes accountability, defines data standards, and monitors compliance throughout the data lifecycle, including migration.

Example #

A governance council mandates that all master data records must have a valid Global Unique Identifier (GUID) and undergo quarterly quality checks.

Practical application #

Governance frameworks provide the criteria for data validation, approval workflows, and issue escalation during migration projects.

Challenges #

Gaining executive sponsorship, reconciling conflicting stakeholder priorities, and enforcing policies across autonomous business units.

Data Integration – the process of combining data from disparate sources i… #

Data Integration – the process of combining data from disparate sources into a unified view for analysis or operational use.

Explanation #

Integration techniques vary from batch extraction‑transform‑load to real‑time API‑based synchronization, each influencing migration design.

Example #

An organization extracts customer records from a legacy CRM, enriches them with loyalty data from a separate system, and loads the consolidated view into the MDM hub.

Practical application #

Effective integration reduces duplication, improves data consistency, and accelerates time‑to‑value after migration.

Challenges #

Managing schema mismatches, handling data latency, and ensuring transactional integrity across heterogeneous environments.

Data Lineage – the documented history of data’s origin, movements, transf… #

Data Lineage – the documented history of data’s origin, movements, transformations, and destinations.

Explanation #

Lineage maps the flow from source fields through transformation rules to target attributes, enabling impact analysis and auditability.

Example #

A lineage diagram shows that LegacySystemA.CustomerID maps to MDM.Customer.GUID after a cleansing step that removes leading zeros.

Practical application #

During migration, lineage supports root‑cause analysis of data anomalies and satisfies regulatory reporting requirements.

Challenges #

Capturing lineage for legacy systems lacking documentation, and maintaining accurate lineage as transformation logic evolves.

Data Mapping – the specification of how source data elements correspond t… #

Data Mapping – the specification of how source data elements correspond to target data elements, often expressed as mapping tables or scripts.

Explanation #

Mapping defines the rules for data conversion, including field renaming, type conversion, and business logic application.

Example #

Mapping rule: Source.Price (string) → Target.StandardPrice (decimal) with a conversion that strips currency symbols and applies rounding.

Practical application #

Accurate mapping is critical to preserve data semantics and avoid loss of meaning during migration.

Challenges #

Complex many‑to‑many relationships, inconsistent source definitions, and undocumented legacy transformations increase mapping effort.

Data Migration – the systematic process of moving data from one system or… #

Data Migration – the systematic process of moving data from one system or environment to another, while preserving quality, integrity, and relevance.

Explanation #

Migration encompasses extraction, transformation, loading, validation, and post‑migration reconciliation.

Example #

Migrating product master data from an on‑premise ERP to a cloud‑based MDM platform involves extracting CSV files, applying business rules, and loading via bulk APIs.

Practical application #

A well‑planned migration minimizes operational disruption and ensures business continuity.

Challenges #

Data volume, downtime constraints, and hidden data quality defects often cause schedule overruns and budget escalations.

Data Quality – the measure of data’s accuracy, completeness, consistency,… #

Data Quality – the measure of data’s accuracy, completeness, consistency, timeliness, and relevance for its intended purpose.

Explanation #

High‑quality master data underpins reliable analytics, operational efficiency, and regulatory compliance.

Example #

A data quality rule may require that every Customer.Email contains an “@” symbol and passes a domain validation check.

Practical application #

Quality rules are embedded in migration pipelines to flag or correct records before they enter the target system.

Challenges #

Legacy data may contain duplicates, missing values, and outdated formats that are costly to remediate.

Data Stewardship – the responsibility for managing data assets, ensuring… #

Data Stewardship – the responsibility for managing data assets, ensuring they meet quality standards, and supporting business users.

Explanation #

Stewards act as subject‑matter experts who define rules, resolve issues, and approve changes to master data.

Example #

The product data steward validates new SKU entries for correct categorization and pricing hierarchy before they are loaded into the MDM hub.

Practical application #

Engaging stewards early in migration projects accelerates rule definition and facilitates smoother acceptance of the new system.

Challenges #

Limited time allocation for stewards, ambiguous role boundaries, and resistance to change can hinder effective stewardship.

Data Warehouse – a centralized repository optimized for query and analysi… #

Data Warehouse – a centralized repository optimized for query and analysis, storing integrated, subject‑oriented, and historical data.

Explanation #

While not a master data store, the warehouse often consumes master data to enrich transactional facts for reporting.

Example #

A sales data warehouse joins transaction lines with the Customer master to produce region‑level revenue dashboards.

Practical application #

Accurate master data migration ensures that downstream reporting remains reliable and consistent.

Challenges #

Misalignment between warehouse dimensional models and master data structures can cause join errors and inaccurate metrics.

Entity Relationship (ER) Modeling – a technique for visualizing and defin… #

Entity Relationship (ER) Modeling – a technique for visualizing and defining the logical relationships between data entities.

Explanation #

ER diagrams help architects identify primary keys, foreign keys, and association rules, forming the basis for migration schema design.

Example #

An ER diagram shows a one‑to‑many relationship between Customer (parent) and Order (child) entities.

Practical application #

During migration, ER models guide the creation of referential integrity constraints in the target database.

Challenges #

Translating legacy, undocumented relationships into a coherent ER model often requires extensive data profiling.

Enterprise Data Model (EDM) – a high‑level, organization‑wide representat… #

Enterprise Data Model (EDM) – a high‑level, organization‑wide representation of core data domains and their interconnections.

Explanation #

The EDM provides a common vocabulary for business and IT, supporting initiatives such as master data migration, analytics, and compliance.

Example #

The EDM includes domains such as Customer, Product, Supplier, and Location, each with defined attributes and relationships.

Practical application #

Aligning migration scope with the EDM ensures that all critical master entities are addressed and that cross‑domain dependencies are respected.

Challenges #

Keeping the EDM current as business processes evolve and reconciling divergent interpretations across departments.

ETL (Extract, Transform, Load) – a batch‑oriented integration pattern tha… #

ETL (Extract, Transform, Load) – a batch‑oriented integration pattern that extracts data from source systems, applies transformation logic, and loads the result into a target repository.

Explanation #

ETL is commonly used for large‑scale master data migrations where performance and control over transformation steps are paramount.

Example #

An ETL job extracts Vendor records from a legacy ERP, standardizes address formats, and loads them into the MDM hub.

Practical application #

Scheduling ETL jobs during low‑usage windows reduces impact on operational systems.

Challenges #

Complex transformations can lead to long processing times, and error handling must be robust to avoid data loss.

ELT (Extract, Load, Transform) – an integration approach that loads raw d… #

ELT (Extract, Load, Transform) – an integration approach that loads raw data into a staging area first, then performs transformations within the target environment.

Explanation #

ELT leverages the processing power of modern databases or data lakes, simplifying architecture and often reducing data movement.

Example #

Raw customer files are loaded into a cloud data lake; SQL scripts then cleanse and enrich the data before it is moved to the MDM system.

Practical application #

ELT is advantageous when the target platform offers scalable compute resources and when transformations are iterative.

Challenges #

Requires careful security controls on the staging area and may expose raw data to unauthorized access if not properly governed.

Entity Governance – the set of policies and controls that dictate how spe… #

Entity Governance – the set of policies and controls that dictate how specific master entities are created, updated, and retired.

Explanation #

Governance rules for each entity ensure consistency, prevent duplication, and enforce business constraints.

Example #

The Product entity governance mandates that any new product must undergo a review process and receive an approved Category assignment before activation.

Practical application #

Embedding entity governance into migration workflows automates compliance checks and reduces manual rework.

Challenges #

Overly rigid rules can stall migration, while lax controls may allow poor‑quality data to proliferate.

Global Unique Identifier (GUID) – a system‑generated, universally unique… #

Global Unique Identifier (GUID) – a system‑generated, universally unique key used to identify master records across all applications.

Explanation #

GUIDs decouple business keys from technical identifiers, supporting data consolidation and interoperability.

Example #

A customer record receives a GUID like 3F2504E0‑4F89‑11D3‑9A0C‑0305E82C3301, which is referenced by downstream systems instead of the legacy customer number.

Practical application #

GUIDs simplify merging records from multiple source systems during migration, avoiding key collisions.

Challenges #

GUIDs increase storage requirements, can be less human‑readable, and may require additional indexing strategies for performance.

Metadata – data that describes other data, such as definitions, structure… #

Metadata – data that describes other data, such as definitions, structures, lineage, and usage contexts.

Explanation #

Metadata enables discovery, impact analysis, and governance by providing contextual information about master data assets.

Example #

The metadata entry for Product.Price includes its data type (decimal), source system (ERP), and business rule (must be non‑negative).

Practical application #

Maintaining accurate metadata during migration helps downstream applications interpret the data correctly.

Challenges #

Legacy systems often have incomplete or inaccurate metadata, requiring extensive discovery efforts.

Reference Data – relatively static data that categorizes or classifies ot… #

Reference Data – relatively static data that categorizes or classifies other data, such as country codes, currency codes, or industry classifications.

Explanation #

Reference data provides the contextual framework for master data and is frequently shared across multiple domains.

Example #

The ISO 3166 country code list is used to standardize Customer.Country values across all systems.

Practical application #

Consolidating reference data during migration reduces redundancy and ensures uniform reporting.

Challenges #

Inconsistent code usage, outdated values, and lack of a single source of truth can lead to mismatches.

Source System – the original application or database that holds the maste… #

Source System – the original application or database that holds the master data to be migrated.

Explanation #

Understanding source system schemas, data quality, and business rules is essential for accurate migration planning.

Example #

An on‑premise SAP ERP serves as the source system for product master data.

Practical application #

Direct extraction from the source system enables incremental migration and reduces the need for intermediate staging.

Challenges #

Limited access permissions, proprietary data formats, and undocumented customizations can impede extraction.

Target System – the destination application or platform where master data… #

Target System – the destination application or platform where master data will reside after migration.

Explanation #

The target system’s data model, validation rules, and performance constraints shape the migration design.

Example #

A cloud‑based MDM solution with a RESTful API is the target for migrated customer records.

Practical application #

Aligning target system capabilities with business requirements ensures that the migrated data can be leveraged immediately.

Challenges #

Incompatible data types, restrictive APIs, and capacity limits may require data transformation or staging.

Transformation Rules – the business logic applied to source data to conve… #

Transformation Rules – the business logic applied to source data to convert it into the target format, including calculations, look‑ups, and conditional mappings.

Explanation #

Rules are codified in scripts, configuration files, or mapping tools and must be thoroughly tested before execution.

Example #

A rule that calculates StandardPrice as ListPrice × (1 – DiscountRate) and rounds to two decimal places.

Practical application #

Centralizing transformation rules in a repository promotes reuse across multiple migration runs.

Challenges #

Complex conditional logic, dependency ordering, and performance overhead can make rule implementation difficult.

Versioning – the practice of tracking changes to master data definitions,… #

Versioning – the practice of tracking changes to master data definitions, schemas, and migration scripts over time.

Explanation #

Version control ensures that any alteration to data structures or transformation logic is auditable and reversible.

Example #

A Git repository holds the Customer mapping file, with each commit documenting a new attribute addition.

Practical application #

During migration, versioned scripts enable rollback to a known good state if validation failures occur.

Challenges #

Inadequate documentation of versions can lead to confusion, especially when multiple teams work concurrently.

Workflow – an orchestrated sequence of tasks, approvals, and notification… #

Workflow – an orchestrated sequence of tasks, approvals, and notifications that guide data through migration stages.

Explanation #

Workflows enforce governance, track progress, and ensure that required quality checks are performed before data is loaded.

Example #

A workflow that routes newly transformed supplier records to the data steward for validation before final load.

Practical application #

Automated workflows reduce manual effort and provide audit trails for compliance.

Challenges #

Designing flexible yet robust workflows that accommodate exceptions without causing bottlenecks.

Data Profiling – the systematic analysis of source data to assess its str… #

Data Profiling – the systematic analysis of source data to assess its structure, content, and quality characteristics.

Explanation #

Profiling generates metrics such as null rates, distinct counts, and pattern frequencies, informing mapping and cleansing strategies.

Example #

Profiling reveals that 12 % of Customer.PhoneNumber entries contain non‑numeric characters, indicating a need for cleansing.

Practical application #

Results guide the prioritization of data quality remediation tasks before migration.

Challenges #

Large data volumes and encrypted fields can limit the depth of profiling, and profiling tools may misinterpret custom formats.

Data Cleansing – the process of detecting and correcting inaccurate, inco… #

Data Cleansing – the process of detecting and correcting inaccurate, incomplete, or inconsistent data values.

Explanation #

Cleansing may involve format conversion, duplicate removal, address validation, and enrichment with external sources.

Example #

Standardizing all Address.State values to two‑letter abbreviations (e.g., “California” → “CA”).

Practical application #

Clean data reduces downstream errors and improves the reliability of analytics after migration.

Challenges #

Over‑cleaning can inadvertently alter legitimate data, while under‑cleaning leaves quality issues unresolved.

Data Consolidation – the act of merging data from multiple source systems… #

Data Consolidation – the act of merging data from multiple source systems into a single, unified repository.

Explanation #

Consolidation eliminates redundancy, resolves conflicts, and creates a single source of truth for each master entity.

Example #

Combining customer records from three regional CRMs into one global MDM hub, applying deduplication rules.

Practical application #

Consolidated master data supports consistent reporting and streamlined operations.

Challenges #

Conflict resolution (e.g., differing attribute values for the same entity) requires well‑defined business rules and stakeholder agreement.

Data Synchronization – the ongoing process of keeping master data consist… #

Data Synchronization – the ongoing process of keeping master data consistent across multiple systems after migration.

Explanation #

Synchronization can be uni‑directional (source to target) or bi‑directional, depending on integration needs.

Example #

An API‑based sync updates the Product.Price in the e‑commerce platform whenever the MDM hub price changes.

Practical application #

Maintaining synchronization prevents data drift and ensures that downstream processes operate on current master data.

Challenges #

Latency, conflict resolution, and handling offline systems complicate synchronization design.

Data Replication – the duplication of data from a primary system to one o… #

Data Replication – the duplication of data from a primary system to one or more secondary systems, often for performance or availability reasons.

Explanation #

Replication can be synchronous (real‑time) or asynchronous (batch), influencing migration cutover strategies.

Example #

Replicating the master customer table to a reporting database for fast analytics queries.

Practical application #

Replication enables zero‑downtime migration by allowing the new system to be populated while the old system remains active.

Challenges #

Maintaining data consistency, handling schema changes, and managing replication latency.

Data Lifecycle – the sequence of stages that data passes through, from cr… #

Data Lifecycle – the sequence of stages that data passes through, from creation to archival or deletion.

Explanation #

Master data lifecycle management defines policies for onboarding, modification, deprecation, and disposal of master records.

Example #

A product is introduced (creation), later updated with new specifications (maintenance), and eventually discontinued (retirement).

Practical application #

Embedding lifecycle controls in migration workflows ensures that only active records are moved, reducing unnecessary load.

Challenges #

Determining appropriate retention periods and automating lifecycle transitions without disrupting business processes.

Data Security – the set of controls and technologies that protect data fr… #

Data Security – the set of controls and technologies that protect data from unauthorized access, alteration, or destruction.

Explanation #

Security measures must be applied both in transit during migration and at rest in the target environment.

Example #

Encrypting CSV files with AES‑256 before transferring them to the cloud MDM platform.

Practical application #

Implementing role‑based access controls on the migration tool limits exposure of sensitive fields such as SocialSecurityNumber.

Challenges #

Balancing security with performance, managing encryption keys, and complying with jurisdictional data protection laws.

Data Privacy – the principle and regulatory requirement that personal dat… #

Data Privacy – the principle and regulatory requirement that personal data be collected, used, and disclosed in a lawful and transparent manner.

Explanation #

Privacy considerations impact which attributes can be migrated, how they are stored, and who can access them.

Example #

Masking Customer.SSN with a hash function before loading into a test environment.

Practical application #

Privacy impact assessments guide the selection of data elements for migration and dictate necessary consent documentation.

Challenges #

Identifying all personally identifiable information (PII) across legacy systems and ensuring compliance across multiple jurisdictions.

Data Compliance – adherence to internal policies, industry standards, and… #

Data Compliance – adherence to internal policies, industry standards, and external regulations governing data handling.

Explanation #

Compliance checks are embedded in migration pipelines to verify that data transformations meet statutory requirements.

Example #

Verifying that all financial transaction records include a valid TaxIdentificationNumber before they are loaded into the new ERP.

Practical application #

Automated compliance validation reduces the risk of costly penalties post‑migration.

Challenges #

Keeping up‑to‑date with evolving regulations and interpreting ambiguous requirements into concrete technical controls.

Data Standardization – the act of applying uniform formats, codes, and na… #

Data Standardization – the act of applying uniform formats, codes, and naming conventions to data elements across the enterprise.

Explanation #

Standardization simplifies integration, improves searchability, and supports consistent reporting.

Example #

Converting all dates to the ISO 8601 format (YYYY‑MM‑DD) regardless of source system representation.

Practical application #

Standardized data reduces the need for ad‑hoc conversion logic in downstream applications.

Challenges #

Legacy systems may store dates in locale‑specific formats, and enforcing standards may require extensive data transformation.

Data Enrichment – the process of augmenting master data with additional i… #

Data Enrichment – the process of augmenting master data with additional information from external sources to increase its value.

Explanation #

Enrichment can add demographic details, geocodes, or industry classifications that support analytics and decision‑making.

Example #

Adding latitude and longitude coordinates to Store.Location records using a geocoding service.

Practical application #

Enriched master data enables advanced capabilities such as location‑based marketing after migration.

Challenges #

Ensuring data licensing compliance, handling mismatched keys, and maintaining enrichment updates over time.

Data Orchestration – the coordination of multiple data processing activit… #

Data Orchestration – the coordination of multiple data processing activities, often across different tools and environments, to achieve a cohesive workflow.

Explanation #

Orchestration platforms schedule, monitor, and manage dependencies among extraction, transformation, validation, and load tasks.

Example #

Using an orchestration engine to trigger the ETL job, followed by a data quality validation step, and finally a notification to the data steward.

Practical application #

Centralized orchestration provides visibility into migration progress and facilitates rapid issue resolution.

Challenges #

Integrating heterogeneous tools, handling error propagation, and ensuring scalability for large data volumes.

Data Integration Platform – a software suite that provides capabilities f… #

Data Integration Platform – a software suite that provides capabilities for connecting, transforming, and delivering data across systems.

Explanation #

Platforms may offer pre‑built connectors, visual mapping, and governance features that accelerate migration projects.

Example #

An integration platform as a service (iPaaS) offering connectors for SAP, Salesforce, and the target MDM API.

Practical application #

Leveraging a platform reduces custom coding effort and provides built‑in monitoring and logging.

Challenges #

Licensing costs, vendor lock‑in, and the need to customize connectors for unique legacy interfaces.

Batch Processing – the execution of data operations on large groups of re… #

Batch Processing – the execution of data operations on large groups of records at scheduled intervals rather than in real time.

Explanation #

Batch jobs are suitable for migrating massive data sets when downtime windows are available.

Example #

A nightly batch extracts all new and changed product records, applies transformations, and loads them into the MDM hub.

Practical application #

Batch processing allows for thorough validation before committing data to production.

Challenges #

Long processing times can extend migration windows, and error handling must be robust to prevent partial loads.

Real‑time Processing – the handling of data as it is generated or receive… #

Real‑time Processing – the handling of data as it is generated or received, with minimal latency.

Explanation #

Real‑time pipelines are used when master data must be instantly available to downstream systems, such as in e‑commerce.

Example #

A change data capture (CDC) mechanism streams new customer registrations into the MDM hub within seconds.

Practical application #

Real‑time processing enables continuous synchronization, reducing the need for large batch cutovers.

Challenges #

Requires high‑availability infrastructure, sophisticated error handling, and careful management of data consistency.

API (Application Programming Interface) – a set of defined methods and da… #

API (Application Programming Interface) – a set of defined methods and data structures that allow applications to interact programmatically.

Explanation #

APIs are commonly used to load or retrieve master data during migration, especially for cloud‑based targets.

Example #

Posting a JSON payload to the MDM /customers endpoint to create a new master record.

Practical application #

APIs provide granular control, enabling incremental migration and real‑time validation of each record.

Challenges #

Rate limits, authentication complexities, and differing data format expectations can slow migration progress.

Data Governance Council – a cross‑functional body that defines, approves,… #

Data Governance Council – a cross‑functional body that defines, approves, and oversees data governance policies and initiatives.

Explanation #

The council sets the strategic direction for master data management, including migration standards and timelines.

Example #

The council approves the master data migration charter, defines acceptable data quality thresholds, and assigns stewardship responsibilities.

Practical application #

Council decisions are documented and referenced throughout the migration to ensure alignment with organizational objectives.

Challenges #

Coordinating schedules of senior leaders, reconciling divergent department priorities, and maintaining momentum over long‑term projects.

Data Policy – a formal statement that outlines the rules, responsibilitie… #

Data Policy – a formal statement that outlines the rules, responsibilities, and expectations for data handling within the organization.

Explanation #

Policies cover aspects such as data ownership, retention, access, and quality requirements for master data.

Example #

A policy that mandates all master data changes be captured in an audit log with user attribution.

Practical application #

Policies provide the criteria against which migration activities are measured and audited.

Challenges #

Translating high‑level policy language into actionable technical controls and ensuring consistent enforcement.

Data Strategy – the long‑term plan that aligns data initiatives with busi… #

Data Strategy – the long‑term plan that aligns data initiatives with business goals, outlining how data will be used as a strategic asset.

Explanation #

The strategy defines priorities for master data creation, migration, analytics, and governance.

Example #

The data strategy prioritizes the migration of customer master data to support a new omnichannel experience.

Practical application #

A clear strategy guides resource allocation, technology selection, and timeline planning for the migration project.

Challenges #

Shifting business priorities, budget constraints, and rapid technology change can require frequent strategy revisions.

Data Architecture Blueprint – a detailed diagrammatic representation of t… #

Data Architecture Blueprint – a detailed diagrammatic representation of the target data environment, including layers, components, and interfaces.

Explanation #

The blueprint serves as a contract between business and technical teams, illustrating how master data will be stored, accessed, and governed.

Example #

The blueprint shows the MDM hub, downstream data lake, and analytics layer, with data flow arrows indicating integration points.

Practical application #

Using the blueprint during migration helps validate that each component is provisioned correctly and that data flows as intended.

Challenges #

Maintaining the blueprint’s accuracy as design decisions evolve and ensuring that all stakeholders interpret it consistently.

Data Lake – a centralized repository that stores raw, unstructured, and s… #

Data Lake – a centralized repository that stores raw, unstructured, and structured data at scale, often on inexpensive storage.

Explanation #

While not a master data store, a lake can hold source extracts for staging, archival, or exploratory analysis before loading into MDM.

Example #

Raw CSV dumps of legacy customer files are landed in an S3‑based data lake for preprocessing.

Practical application #

Leveraging a data lake enables flexible, schema‑on‑read processing and supports iterative migration testing.

Challenges #

Without proper governance, lakes can become “data swamps” with unmanaged, low‑quality data that hinders downstream consumption.

Data Hub – a central integration point that consolidates, cleanses, and d… #

Data Hub – a central integration point that consolidates, cleanses, and distributes master data to consuming applications.

Explanation #

The hub enforces governance rules, provides a single source of truth, and often includes workflow and stewardship capabilities.

Example #

A hub receives product data from ERP, supplier portals, and third‑party catalogs, harmonizes attributes, and publishes the consolidated view via APIs.

Practical application #

Migrating master data into a hub creates a reusable asset that feeds multiple downstream systems, reducing duplication.

Challenges #

Designing the hub’s data model to accommodate diverse source structures and managing change as business needs evolve.

Data Vault – a modeling technique that captures all data changes over tim… #

Data Vault – a modeling technique that captures all data changes over time, emphasizing auditability and scalability.

Explanation #

In a vault, entities are stored as hubs, relationships as links, and descriptive attributes as satellites, preserving raw source information.

Example #

A Customer hub stores the business key, while a satellite records each change to address or status with timestamps.

Practical application #

Using a vault for migration provides a complete history, supporting regulatory audits and rollback capabilities.

Challenges #

The vault’s highly normalized structure can increase query complexity and may require additional transformation for reporting consumption.

Data Mart – a focused subset of a data warehouse tailored to a specific b… #

Data Mart – a focused subset of a data warehouse tailored to a specific business line or function.

Explanation #

Data marts often contain denormalized, ready‑to‑use data for reporting and may rely on master data for dimension consistency.

Example #

A sales data mart includes fact tables for orders and a dimension table for customers sourced from the MDM hub.

Practical application #

Ensuring that migrated master data aligns with data mart dimensions prevents mismatched reporting.

Challenges #

Maintaining synchronization between the central master data store and multiple data marts, especially after incremental loads.

Data Federation – a virtual integration approach that provides a unified… #

Data Federation – a virtual integration approach that provides a unified view of data without moving it from its source locations.

Explanation #

Federation enables users to query master data across systems as if it were a single repository, useful during phased migration.

Example #

A federation layer combines customer records from ERP, CRM, and a cloud SaaS platform into a single virtual table.

Practical application #

Federation can serve as a temporary bridge while legacy systems are decommissioned, reducing the need for immediate full migration.

Challenges #

Performance overhead, inconsistent security models across sources, and limited support for complex transformations.

Data Virtualization – the creation of abstracted data services that expos… #

Data Virtualization – the creation of abstracted data services that expose data from multiple sources through a common interface, often in real time.

Explanation #

Virtualization decouples consumption from physical storage, allowing applications to access master data without replication.

Example #

A virtual data service presents a unified Customer view that pulls attributes from both an on‑premise database and a cloud CRM.

Practical application #

Virtualization supports gradual migration strategies, enabling new applications to consume data from the target system while legacy sources remain active.

Challenges #

Managing latency, ensuring data consistency, and handling schema evolution across the virtualized sources.

Data Modeling Techniques – the set of methodologies used to design logica… #

Data Modeling Techniques – the set of methodologies used to design logical and physical data structures, such as ER modeling, dimensional modeling, and object‑oriented modeling.

Explanation #

Selecting appropriate techniques influences migration complexity, performance, and future extensibility.

Example #

Using a star schema for analytical reporting while maintaining a normalized relational model for the master data repository.

Practical application #

Aligning modeling techniques with business use cases ensures that migrated data supports both operational and analytical needs.

Challenges #

Balancing the trade‑offs between normalization (reducing redundancy) and denormalization (improving query speed) in the target architecture.

Dimensional Modeling – a design approach that structures data into fact t… #

Dimensional Modeling – a design approach that structures data into fact tables and dimension tables to support analytical queries.

Explanation #

While primarily used for analytical data, dimensional models often reference master data dimensions for consistency.

Example #

A SalesFact table includes a foreign key to the CustomerDim dimension populated from the MDM hub.

Practical application #

Ensuring that dimensional keys are correctly populated during migration prevents orphaned records and inaccurate reporting.

Challenges #

Maintaining synchronization between dimension records and the source master data, especially when dimensions are refreshed incrementally.

Normalized Model – a relational design that organizes data to minimize re… #

).

Explanation #

Normalization promotes data consistency and supports efficient updates, making it a common choice for master data repositories.

Example #

Storing address components in a separate Address table linked to Customer via a foreign key.

Practical application #

Normalized structures simplify enforcement of referential integrity during migration.

Challenges #

Complex joins may impact query performance, and some reporting tools prefer denormalized views,

June 2026 intake · open enrolment
from £90 GBP
Enrol