Today, I will discuss about Avro Output Component in Pentaho. In my previous blog, I have share my experience about Avro input component where Data Deserialization happens. In this Component, Data Serialization Happens. So, if you have data in a text format, you can convert the same in Avro format as well. As soon as you do this conversion, a Schema file also get generated along the Avro file.This all can be achieved through Avro Output Component in Pentaho I have designed a very simple Transformation wherein we have csv…
Category: HIVE
Avro File Input In Pentaho
Today, I will discussing about the Avro input component in Pentaho. Avro uses the concept of serialization and De-serialization. Serialization means processing the data into binary format. Its very clear that if we have data in binary format ,its unreadable and hence very effective way to transfer over the network. Therefore, many Organization are adopting this technique due to data security concerns. Deserialization means convert the binary formatted data into a readable form. Now the question comes how binary data is deserialized. Here , comes the concept of Schema file.…
EDIT THE DATA IN HIVE TABLES
In Hive, We know that it works on file reading mechanism where hive reads data present in files present in hadoop file system. Here , pre-requisite is you should have basic knowledge of Hive. STEP-1 Copy the Hadoop files of a particular Partition for that particular Hive object to your local server using get command. For example hdfs dfs –get /hadoop-server-details/path-of-the-file/partition-column-name=value /home/user1/ Here Assumption is file format of files that are mapped to hive object is not normal text file (say avro file) . So, it is recommended to copy…
BEELINE COMMAND LINE IN HIVE
Today, I will discuss about the beeline command line which we use to call the SQL Query through Linux. But what if same thing needs to be called through Shell Scripting. First of all, we need to call Sql Query through Beeline command line inside shell Scripting using below command. beeline -u “jdbc:hive2://localhost:10000/default;principal=hive/localhost“ -n “username” -p “password” –hivevar var1=$col1_hive –hivevar var2=$schema_name –hivevar var3=$table_name –hivevar var4=$col1_value -f sql_script.sql > text.log Here $col1_hive is the column name of a table. $table_name is the table name. $schema_name is the Schema Name where that…