create table using file having variable length for each field in hive

Today, I will discuss about to how to create Hive table using file in which fields length is different for all fields .To put in differently, it is neither fixed length file nor delimited file.Below is the example of file contentDefinition of a fileFIELD1, length 2, value are 10 ,10FIELD2, length 5 , value are 5,125FIELD3, length 6 , value are 3,12FIELD4, length 7, value are ABC,EFFIELD5, length 4, value are 12,21FIELD6, length 10, value are 15,10 In order to create table for such files, use the below syntax for…

check whether process is running or not

Today, I will discuss about existence of a process in the linux server through shell script. I will share the pseudo code for the same in the form of steps.* Create a shell script which runs in infinite loop using while 1==1 * Inside this while loop, check the existence of a process on every 15th minute which can be achieved by dividing the minutes part of current timestamp by 15 using MOD function. If it returns 0 , go to next statements else come out of If statement. *…

Automate the existence of S3 files through shell scripting

Today, I will discuss about “How to automate the existence of files in S3 bucket through shell script”. Here , i will share pseudo code for the same in the form of steps.* create one config file which we will have below details with pipe delimiter.S3 bucket Name|project_folder_name|Relative path of sub folderTechie-1|ABC|prj1/UC1/ #### Here techie-1 is the bucket Name, the immediate folder to this bucket is ABC and project Use-case folder is prj1/UC1/ ####* This[config] file will have list of Use-case folder names for which you need to test the…

features of Athena in AWS

Today, I will discuss about the Athena Database features in AWS. Below are the characteristics of the Athena .* Query Editor : This is the workspace where you can write your queries. * Saved Queries : This is the location where you can save your queries which are frequently used using “Save as” * History : It will have list of queries which are executed in the Query Editor section. * Athena logs : As mentioned in by previous blog, athena maintain logs in the S3 bucket as per the…

create table in Athena using CSV file

Today, I will discuss about “How to create table using csv file in Athena”.Please follow the below steps for the same.* Upload or transfer the csv file to required S3 location. * Create table using below syntax. create external table emp_details (EMPID int, EMPNAME string ) ROW FORMAT SERDE ‘org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe’ WITH SERDEPROPERTIES ( ‘serialization.format’ = ‘,’, ‘field.delim‘ = ‘,’ ) location ‘s3://techie-1/emp/’ TBLPROPERTIES ( “skip.header.line.count”=”1”) * Important to note here that if you have a file which has header , then you need to skip the header .For this, we…

create database in Athena

Today , I will discuss about “How to create Database in Athena.Follow the below steps to achieve the same.* Create IAM policy which gives access to S3 bucket for Athena Logs to the required user.* While creating that IAM policy, you need to put the below content in the JSON tab.* Here, in below JSON, you need to change the bucket Name. * If you see clearly , in JSON both Glue and Athena has mentioned , so if you don’t want to give access to Glue DB, then you…

Equivalent of rollup in hive

Today, i will discuss about ” equivalent of rollup in hive”. There are scenarios where we need to replace the keyword rollup from the queries as it might not be working as expected. To put it differently, we might need to transform the oracle/sqlserver query to hive where rollup doesn’t give same results which we got in oracle. In an employee table. I have EMP_ID,DEPT_NAME,SAL_MONTH,SALARY columns. Query-1 with rollup :select EMP_ID,DEPT_NAME,SAL_MONTH,sum(SALARY) from EMP group by EMP_ID,DEPT_NAME ,rollup( SAL_MONTH ); Equivalent Query-1 without rollup using group by clause : select EMP_ID,DEPT_NAME,null,sum(SALARY)…

Date functions in Hive

Today, I will share list of queries related to “date” logic in which developers generally face issues during Hive query execution. * Query to get the sysdate select from_unixtime(unix_timestamp(current_timestamp())) * Query to get the yesterday’s date. Here 86400 is the number of seconds in a day(24 Hrs * 60 Minutes(in a hour)* 60 Seconds(in a minute)) select to_date(from_unixtime(unix_timestamp(current_timestamp(),’yyyy-MM-dd’)-86400*1));* Substract the x number of days from the particular date. select to_date(from_unixtime(unix_timestamp(current_timestamp(),’yyyy-MM-dd’)-86400*x)); * Convert the sysdate to YYYY-MM-DD format select to_date(from_unixtime(unix_timestamp(current_timestamp(),’yyyy- MM-dd’)));* Convert the sysdate to YYYYMMDD format select regexp_replace(to_date(from_unixtime(unix_timestamp(current_timestamp (),’yyyy-MM-dd’))),’-‘,”);* Convert…

Run hive queries online

Today, I will discuss about “How to run hive queries online”. Below are the steps for the same.* Go to below link https://demo.gethue.com/hue/accounts/login?next=/ * Enter the username and password as “demo”.* Click on “Sign in” Button. Below Screen will appear. As you can see, many users have created their databases. * In order to create your own database, Click on “+” icon.* Fill Database Name and description . * Click on submit button. The output screen will appear on the top-right section.So, we have database ready and name is “hive_testing”…