Skip to main content

Module 4: Canonical Models and Data Quality

Canonical Models and Data Quality

Key Terms

  1. Entities: Entities are higher-level concepts in data modeling that group together related tables representing a common theme or purpose, such as Customer or Order.

  2. Attributes: Attributes are specific data points within an entity, representing common columns across grouped tables, such as FirstName or OrderDate.


  • Before proceeding with canonical data modeling, ensure that data mappings have been generated to create the necessary entities and attributes. Refer to the Generate Mappings section for detailed instructions.

Step 1: Data Modeling with Canonical Loading


In data modeling, you can use canonical loading to group entities from different data sources, ensuring a unified structure. Additionally, Data Quality Recommendations help in enhancing the data quality and compatibility across entities.

  1. Canonical Loading:

    • Use the "Table Profile and Canonical Load Type" feature to group entities.
    • This helps in creating a consistent data structure across different data sources.
  2. Generate Business Key:

    • Click on the "Generate Business Key" button to create a unique identifier for a given entity.
    • This key helps in maintaining consistency and uniqueness across records.
  3. Check Data Compatibility:

    Click on "Check Data Compatibility" to evaluate:

    • Data Compatability: The similarity in data between attributes from entities across schemas., through the overlap percentage and relationship type
    • Schema Compatibility: to verify if an attribute in an entity maps to attributes in other source entities.
    • The output is a true or false value indicating compatibility.


  4. Data Lake Load:

    • Click on "Data Lake Load" to generate data from the canonical loading of different attributes.
    • This feature helps in loading the data into a data lake for further processing and analysis.
  5. Data Warehouse Load:

    • Click on "Data Warehouse Load" to store the generated parquet files into a data warehouse.
    • This ensures that the data is available for querying and reporting in a structured format.
  6. Add Custom Attributes

    • Select Entity Name: Choose the entity for the custom attribute.

    • Define Source Attributes: Specify the source attributes.

    • Enter Custom Attribute Details: Provide name, description, business key status, classification, and datatype.

    • Enter Prompt Text: Describe the purpose of the custom attribute.

    • Generate and Review Code: Click "Generate and Preview Code" and review it.

    • Save the Custom Attribute: Click "Save" to add the attribute.


Step 2: Data Quality Recommendations

Data Quality Recommendations involves defining data structures and relationships for a given entity. During this step, you also specify potential data quality checks to ensure the integrity and accuracy of the data.


  1. Select Entity:

    • From the list of entities, choose the one you want to model (e.g., Blending Operations).
  2. Generate Data Quality Recommendations:

    • Click the "Generate Data Quality Recommendations" button.
    • The LLM (Large Language Model) will provide a list of suggested data quality checks.
  3. Review Data Quality Rules:

    • Look over the recommended data quality rules for your selected entity.
    • Each suggestion includes:
      • Category: The type of quality check (e.g., Cleansing, Validation).
      • Subcategory: Specific focus of the rule (e.g., Null Handling, Range Checking).
      • Attributes: The data attributes the rule applies to (e.g., Blending ID).
      • Rule Description: What the rule does (e.g., Replace null values with default values).
      • Rule Explanation: Why the rule is important (e.g., Ensures all records have a valid Blending ID).
  4. Discretionary Implementation:

    • Users have the discretion to accept, modify, or reject the suggested rules.
    • Implement the rules that best fit your data quality requirements.
  5. Add Data Quality Recommendations

    • Select the entity and attribute you wish to improve.
    • Choose a rule category and sub-category (e.g., cleansing transformations, null handling).
    • Enter prompt text to guide the LLM in generating data quality rules.
    • Review the generated code to ensure it meets your needs.
    • Save the recommendations to implement the data quality checks.
