create table using file having variable length for each field in hive

Today, I will discuss about to how to create Hive table using file in which fields length is different for all fields .To put in differently, it is neither fixed length file nor delimited file.Below is the example of file contentDefinition of a fileFIELD1, length 2, value are 10 ,10FIELD2, length 5 , value are 5,125FIELD3, length 6 , value are 3,12FIELD4, length 7, value are ABC,EFFIELD5, length 4, value are 12,21FIELD6, length 10, value are 15,10 In order to create table for such files, use the below syntax for…

Equivalent of rollup in hive

Today, i will discuss about ” equivalent of rollup in hive”. There are scenarios where we need to replace the keyword rollup from the queries as it might not be working as expected. To put it differently, we might need to transform the oracle/sqlserver query to hive where rollup doesn’t give same results which we got in oracle. In an employee table. I have EMP_ID,DEPT_NAME,SAL_MONTH,SALARY columns. Query-1 with rollup :select EMP_ID,DEPT_NAME,SAL_MONTH,sum(SALARY) from EMP group by EMP_ID,DEPT_NAME ,rollup( SAL_MONTH ); Equivalent Query-1 without rollup using group by clause : select EMP_ID,DEPT_NAME,null,sum(SALARY)…

Date functions in Hive

Today, I will share list of queries related to “date” logic in which developers generally face issues during Hive query execution. * Query to get the sysdate select from_unixtime(unix_timestamp(current_timestamp())) * Query to get the yesterday’s date. Here 86400 is the number of seconds in a day(24 Hrs * 60 Minutes(in a hour)* 60 Seconds(in a minute)) select to_date(from_unixtime(unix_timestamp(current_timestamp(),’yyyy-MM-dd’)-86400*1));* Substract the x number of days from the particular date. select to_date(from_unixtime(unix_timestamp(current_timestamp(),’yyyy-MM-dd’)-86400*x)); * Convert the sysdate to YYYY-MM-DD format select to_date(from_unixtime(unix_timestamp(current_timestamp(),’yyyy- MM-dd’)));* Convert the sysdate to YYYYMMDD format select regexp_replace(to_date(from_unixtime(unix_timestamp(current_timestamp (),’yyyy-MM-dd’))),’-‘,”);* Convert…

Run hive queries online

Today, I will discuss about “How to run hive queries online”. Below are the steps for the same.* Go to below link https://demo.gethue.com/hue/accounts/login?next=/ * Enter the username and password as “demo”.* Click on “Sign in” Button. Below Screen will appear. As you can see, many users have created their databases. * In order to create your own database, Click on “+” icon.* Fill Database Name and description . * Click on submit button. The output screen will appear on the top-right section.So, we have database ready and name is “hive_testing”…

Avro Output in Pentaho

Today, I will discuss about Avro Output Component in Pentaho. In my previous blog, I have share my experience about Avro input component where Data Deserialization happens. In this Component, Data Serialization Happens. So, if you have data in a text format, you can convert the same in Avro format as well. As soon as you do this conversion, a Schema file also get generated along the Avro file.This all can be achieved through Avro Output Component in Pentaho I have designed a very simple Transformation wherein we have csv…

Avro File Input In Pentaho

Today, I will discussing about the Avro input component in Pentaho. Avro uses the concept of serialization and De-serialization. Serialization means processing the data into binary format. Its very clear that if we have data in binary format ,its unreadable and hence very effective way to transfer over the network. Therefore, many Organization are adopting this technique due to data security concerns. Deserialization means convert the binary formatted data into a readable form. Now the question comes how binary data is deserialized. Here , comes the concept of Schema file.…

EDIT THE DATA IN HIVE TABLES

In Hive, We know that it works on file reading mechanism where hive reads data present in files present in hadoop file system. Here , pre-requisite is you should have basic knowledge of Hive. STEP-1 Copy the Hadoop files of a particular Partition for that particular Hive object to your local server using get command. For example hdfs dfs –get /hadoop-server-details/path-of-the-file/partition-column-name=value  /home/user1/ Here Assumption is file format of files that are mapped to hive object is not normal text file (say avro file) . So, it is recommended to copy…

BEELINE COMMAND LINE IN HIVE

Today, I will discuss about the beeline command line which we use to call the SQL Query through Linux. But what if same thing needs to be called through Shell Scripting. First of all, we need to call Sql Query through Beeline command line inside shell Scripting using below command. beeline -u “jdbc:hive2://localhost:10000/default;principal=hive/localhost“ -n “username” -p “password” –hivevar var1=$col1_hive –hivevar var2=$schema_name –hivevar var3=$table_name –hivevar var4=$col1_value -f sql_script.sql > text.log Here $col1_hive is the column name of a table. $table_name is the table name. $schema_name is the Schema Name where that…