Today, I will discuss about the Dimension Lookup in Pentaho. I have seen various blogs and forums where users share their experience about the issues encountered during the implementation of this component. One of the major issue is null values in the Dimension tables.
There are two scenarios.
1. When you use this component with “Use Auto Incremental Key” for Technical Field, then it inserts a null values on every run which is a concern .
2. When you use this component using “Use table Maximum +1” for Technical Fields, then it insert a null value only once with Technical_Field value as 0(zero) irrespective of number of executions.
If we follow Approach 2, then Dimension Lookup works fine.
Here, We used “Type of Dimension Update” value as Insert which will implement Type-2.
Now, concern is why this component inserts a null record in the Dimension Table. We all know that Technical (Surrogate) keys value in the Dimension tables are populated in the fact table based on the data present in the Fact Table.
Consider a Scenario where fact table has value which is not present in the Dimension tables and hence it return null which is not a good practice as per Kimball’s Dimensional Modelling . This concept is called Unknown Members. So, in order to handle this , we need to assign a totally different value like -1 or 0 to those records in the Fact table and that value should also be present in the Dimension table as well. Hence , a record with NULL values required in the Dimension table.
Here, we will demonstrate the usage of Dimension Lookup component.
Here, I have CSV file which has customer Data. See the below SS for the same.
The data which is already present in the DIM CUSTOMER table is shown below.
Based on the CSV file data and database records,
Shilpa Kaur will be updated to Shilpa Singh
Ram Singh will be updated to Ram Sharma
Kishan Mohan will be updated to Sham Arora . Reason : Same Natural Key which is 15.
Sarita Chhabra will be inserted.
A record will NULL values also get inserted as part of “Unknown Members” Handling in the Fact table.
Now, we will run the ETL job . See the SS of the ETL Code.
Post Execution of the ETL code, See the database records.
3 records ended .
1 record with NULL values inserted which is used for unknown Members.
3 records are updated
1 record Inserted.