Data validation across environments using HASH

Today i will discuss about data validation across environments using HASH function. There are times we need to validate the data of same table in different environments which is very monotonous and time consuming task while doing manually. Its better to automate the process. Here , I am considering the three environments Hive,IBM DB2,Snowflake Database.HIVE:md5(concatenated_columns)SNOWFLAKE:MD5_HEX (concatenated_columns)IBM DB2:HEX(HASH_MD5( concatenated_columns) Here, you need to take care of below things.1. Hexadecimal Output may in uppercase or lowercase, so handle with UPPER or LOWER function across environments.2. You need to take care of nulls…

Count the number of nulls using left join

Today, I will discuss about How to calculate the number of nulls that gets populated with left join or %age of matched data or count of not nulls using left join. CASE 1:Calculate the number of nulls that gets populated from right hand table using left join.Assume , I have two employee table EMP_TBL1 and EMP_TBL2. See the data for the same. EMP_TBL1 EMP_TBL2 below is the code used to get number of nulls that comes from right side table using left join. select count(*)-count(y_empid) from ( select x.emp_id ,y.emp_id…

get position of all delimiter values in a string

Today, I will explain about “How to get position and values separated by delimiter in a string”. Delimiter may be comma,pipe,colon etc. STEP1 : Get position of all delimiters in a string. I am explaining with SQL-server as Database. ;with T as ( select 0 as row_id, charindex(‘,’, @string) pos union all select pos + 1, charindex(‘,’, @string, pos + 1) from T where pos > 0select * from T For example , consider a string “Harsimran,Kaur,91-123-456-7890” . So, it has first name and last name and dummy mobile number.…

Check duplicate record in Hive

Today, I will discuss about ” How to automate the process where in you can check entire row duplicate record in hive”. As I have mentioned in all Automation blogs, I will share the pseudo code.STEP1: In hive , use “desc table_name” , this command will give you column names along with datatype and data length. Store the output of this command in a file , say HIVE_TABLE_DDL.txt STEP2 : Read the file HIVE_TABLE_DDL.txt using “cat” command. cat HIVE_TABLE_DDL.txt | awk ‘{print $1}’ ORS=’,’ | sed ‘s/,$//’ * awk'{print $1}’…

Equivalent of rollup in hive

Today, i will discuss about ” equivalent of rollup in hive”. There are scenarios where we need to replace the keyword rollup from the queries as it might not be working as expected. To put it differently, we might need to transform the oracle/sqlserver query to hive where rollup doesn’t give same results which we got in oracle. In an employee table. I have EMP_ID,DEPT_NAME,SAL_MONTH,SALARY columns. Query-1 with rollup :select EMP_ID,DEPT_NAME,SAL_MONTH,sum(SALARY) from EMP group by EMP_ID,DEPT_NAME ,rollup( SAL_MONTH ); Equivalent Query-1 without rollup using group by clause : select EMP_ID,DEPT_NAME,null,sum(SALARY)…

cassandra input component in Pentaho

Today, I will discuss about “How to use cassandra input component in Pentaho”. For this, the first and foremost criteria is that cassandra database should be downloaded and installed in your local Machine.Once installation is completed, you can start the Apache-cassandra services using below command.cassandra.bat -f ( This batch file is present inside bin folder).In order to create a sample table in the Cassandra database, you need to open another command prompt session where you need to run the cqlsh command(Run this command inside bin folder) which helps to run…

Query to get all records not matching in all tables.

Today, I will discuss about the scenario/Problem which i faced in current project. Problem Statement : You have two Employee tables having data mentioned below. Emp-1 EmpId EmpName 1 Shivani 2 Gaurav 3 Radhe 4 Rahul 5 Anil Emp-2 EmpId EmpName 1 Shivani 6 Ashok 3 Radhe 7 Vikram 4 Rahul Output : EmpId EmpName 2 Gaurav 5 Anil 6 Ashok 7 Vikram Solution : Here, you need to use except or minus based on the database you are using and union clause. See the below query for the same.…

Check SQL History and Monitor Sessions in PL/SQL Developer Tool

Today, i will discuss about how to monitor Sessions and check what all queries executed in pl/sql Sessions. First, In order to Monitor Sessions which are active /inactive in particular Schema in pl/sql developer tool, Go to Tools->Monitor Sessions. See the below Screen Shot for the same. Second, In order to see what all queries executed in particular Session , go to View -> SQL History. See the below image for the same.

GET COLUMN NAMES ON WHICH INDEX IS CREATED IN ORACLE

Today, i will share the query which will give the column names along with table name on which index is created.Below is the Query for the same. select a.index_name,a.table_name,b.column_name from all_indexes a, all_ind_columns b where a.table_name=b.table_name and a.index_name=b.index_name and a.uniqueness=’UNIQUE’ and a.table_name in <LLIST OF TABLE NAMES>   Query to give column names which are not null in the table. select table_name,column_name from all_tab_columns where table_name like ‘%ABC%’ and NULLABLE=’N’ and owner=<SCHEMA_NAME> order by table_name