Today, i will discuss about ” equivalent of rollup in hive”. There are scenarios where we need to replace the keyword rollup from the queries as it might not be working as expected. To put it differently, we might need to transform the oracle/sqlserver query to hive where rollup doesn’t give same results which we got in oracle.
In an employee table. I have EMP_ID,DEPT_NAME,SAL_MONTH,SALARY columns.
Query-1 with rollup :
select EMP_ID,DEPT_NAME,SAL_MONTH,sum(SALARY) from EMP group by
EMP_ID,DEPT_NAME ,rollup( SAL_MONTH );
Equivalent Query-1 without rollup using group by clause :
select EMP_ID,DEPT_NAME,null,sum(SALARY) from EMP group by EMP_ID,DEPT_NAME
UNION ALL
select EMP_ID,DEPT_NAME, SAL_MONTH ,sum(SALARY) from EMP group by EMP_ID,DEPT_NAME,SAL_MONTH
Equivalent Query-1 without rollup using group by and grouping sets clause :
select EMP_ID,DEPT_NAME, SAL_MONTH ,sum(SALARY) from EMP group by EMP_ID,DEPT_NAME, SAL_MONTH grouping sets ( (EMP_ID,DEPT_NAME,SAL_MONTH ), (EMP_ID,DEPT_NAME))
Query-2 with rollup :
select EMP_ID,DEPT_NAME,SAL_MONTH,sum(SALARY) from EMP group by EMP_ID,rollup( DEPT_NAME ,SAL_MONTH );
Equivalent Query-2 without rollup using group by clause :
select EMP_ID,DEPT_NAME, SAL_MONTH ,sum(SALARY) from EMP group by EMP_ID,DEPT_NAME,SAL_MONTH
UNION ALL
select EMP_ID,DEPT_NAME,null,sum(SALARY) from EMP group by EMP_ID,DEPT_NAME
UNION ALL
select EMP_ID,null,null,sum(SALARY) from EMP group by EMP_ID
Equivalent Query-2 without rollup using group by and grouping sets clause :
select EMP_ID,DEPT_NAME, SAL_MONTH ,sum(SALARY) from EMP group by EMP_ID,DEPT_NAME, SAL_MONTH grouping sets ((EMP_ID,DEPT_NAME,SAL_MONTH), (EMP_ID,DEPT_NAME), (EMP_ID))
Query-3 with rollup :
select EMP_ID,DEPT_NAME,SAL_MONTH,sum(SALARY) from EMP group by rollup(EMP_ID ,DEPT_NAME ,SAL_MONTH );
Equivalent Query-3 without rollup using group by clause :
select EMP_ID,DEPT_NAME, SAL_MONTH ,sum(SALARY) from EMP group by EMP_ID,DEPT_NAME,SAL_MONTH
UNION ALL
select EMP_ID,DEPT_NAME,null,sum(SALARY) from EMP group by EMP_ID,DEPT_NAME
UNION ALL
select EMP_ID,null,null,sum(SALARY) from EMP group by EMP_ID
UNION ALL
select null,null,null,sum(SALARY) from EMP
Equivalent Query-3 without rollup using group by and grouping sets clause :
select EMP_ID,DEPT_NAME, SAL_MONTH ,sum(SALARY) from EMP group by EMP_ID,DEPT_NAME, SAL_MONTH grouping sets (( EMP_ID,DEPT_NAME, SAL_MONTH ),( EMP_ID,DEPT_NAME), (EMP_ID),())