Module 4: Canonical Modeling
Key Terms
- Canonical Model: A unified and standardized version of source data, often referred to as the silver layer in the medallion architecture.
- Entities: Logical tables created by grouping similar data across different source systems.
- Attributes: Columns inside an entity, which are generated or mapped using LLM.
- Business Key: A unique identifier for each entity used to track records.
- Data Lake Load / Data Warehouse Load: Options to move cleaned and modeled data to storage or warehouses.
- Custom Code: Transformations or data quality rules written in code before loading.
Step-by-Step Overview
Step 1: Access Canonical Modeling Section
Navigate to the Data Modeling section and select Canonical Model.
Step 2: Generate Business Keys
Click the ⚙️ menu to Generate Business Key automatically or assign it manually.
Step 3: Data Lake / Warehouse Load
Use options like Data Lake Load or Data Warehouse Load to send entities to storage.
Step 4: Add Custom Code or Attributes
In the Code Review section, you can:
- Add custom attributes
- Add transformation logic
- Apply data quality checks
Step 5: Commit and Execute
Once updates are done, click Commit Changes to push to Git, then execute the notebook.