318 views

EDIT THE DATA IN HIVE TABLES

In Hive, We know that it works on file reading mechanism where hive reads data present in files present in hadoop file system. Here , pre-requisite is you should have basic knowledge of Hive.

STEP-1 Copy the Hadoop files of a particular Partition for that particular Hive object to your local server using get command. For example

hdfs dfs –get /hadoop-server-details/path-of-the-file/partition-column-name=value  /home/user1/

Here Assumption is file format of files that are mapped to hive object is not normal text file (say avro file) . So, it is recommended to copy the files first to your local server(Linux).

STEP-2 Once the file is in Linux Server, go to /home/user1/ path using cd command. Now, you want to replace one value to another value , for example  , you want to replace  Chicago to Los Angeles. So, by using sed command, you can achieve this.

sed -i ‘s/Chicago/Los Angeles/g’ filename.avro

Now, you have new file with different City. That is how you have new data set ready.

STEP-3 Now, you have  two different data set and hence , there are two scenarios . 1.To keep the existing partition for that particular hive object and copy new data-set to a different partition for that hive object. 2. To remove the existing data from the hadoop file system and copy the new dataset to  the same existing partition.

Scenario 1: So, first we need to create a different partition for the same object using mkdir command (in hdfs).

hdfs dfs –mkdir /Hadoop-path/file-name/date=2019-12-31

Once new partition is created, now you have to copy the new dataset to new Hadoop partition for that object using below command.

hdfs dfs –put /home/user1/filename.avro /Hadoop-path/file-name/date=2019-12-31

Now, we need to alter hive object as well to create that partition using below command.

alter table schema_name.hiveobject1 add partition (date=’2019-12-31′); Next step is to run msck repair command for that Object.

msck repair table hiveobject1;

Scenario 2: Here, you need to remove the data from existing partition first using below command.

hdfs dfs –rm  -r /Hadoop-path/file-name/date=2019-12-30/* Then copy the new data-set to the existing partition using the below command .

hdfs dfs –put  /home/user1/filename.avro /Hadoop-path/file-name/date=2019-12-30/

Run the msck repair command to sync the data.

msck repair table hiveobject1;

Related posts