Internal and External tables in Hive

Today, I will present types of tables that can be created in hive in a different way. I created three tables in hive.create table empdtls (emp_id int ); { This is called Internal or Managed table} create external table empdtls_ext (emp_id int ); { This is External table } create external table empdtls_ext_v2 ( emp_id int ) location ‘/user/demo/hivetesting/’ { This is also called External table} Now, we will check the TABLE_TYPE of all above tables using command DESCRIBE FORMATTED table_name Internal Or Managed Table/user/hive/warehouse/ is the location where all…

Avro Output in Pentaho

Today, I will discuss about Avro Output Component in Pentaho. In my previous blog, I have share my experience about Avro input component where Data Deserialization happens. In this Component, Data Serialization Happens. So, if you have data in a text format, you can convert the same in Avro format as well. As soon as you do this conversion, a Schema file also get generated along the Avro file.This all can be achieved through Avro Output Component in Pentaho I have designed a very simple Transformation wherein we have csv…

Avro File Input In Pentaho

Today, I will discussing about the Avro input component in Pentaho. Avro uses the concept of serialization and De-serialization. Serialization means processing the data into binary format. Its very clear that if we have data in binary format ,its unreadable and hence very effective way to transfer over the network. Therefore, many Organization are adopting this technique due to data security concerns. Deserialization means convert the binary formatted data into a readable form. Now the question comes how binary data is deserialized. Here , comes the concept of Schema file.…

EDIT THE DATA IN HIVE TABLES

In Hive, We know that it works on file reading mechanism where hive reads data present in files present in hadoop file system. Here , pre-requisite is you should have basic knowledge of Hive. STEP-1 Copy the Hadoop files of a particular Partition for that particular Hive object to your local server using get command. For example hdfs dfs –get /hadoop-server-details/path-of-the-file/partition-column-name=value  /home/user1/ Here Assumption is file format of files that are mapped to hive object is not normal text file (say avro file) . So, it is recommended to copy…