Skip to main content

Module 2: Data Profiling

Data Profiling

Key Terms

Data profiling involves analyzing the data from your data sources to understand its structure, quality, and characteristics. This process helps in identifying the tables, schemas, and detailed information about your data.

Dataprofiling1

Details element example

Step 1: Schema Scan:

  • Click on the "Schema Scan" button.
  • This action will identify all the tables (files) in your selected data source, as well as the assosciated columns and their data types. From the drop-down, select:
    • File Size (between 0-1)
    • File type
    • Folder structure of the data within the container Dataprofiling2

Step 2: Data Scan:

Dataprofiling3

  • Select a table from the list (e.g., blending_operations).
  • Click on the "Scan Data Table" button to identify the table schema.
  • This will display information about the table, such as column names and data types, primary keys, and unique keys. It also provides more in-depth table information such as:
    • Delta Key: A key used for identifying changes.
    • Null Values: The number of null values in each column.
    • Additional column profiles, such as uniqueness and data type details.
  • Specific information on table columns can be viewed by clicking on the 'eye' under the Column Profile, opening a new window with column profiling information. Dataprofiling4