EDIT THE DATA IN HIVE TABLES

In Hive, We know that it works on file reading mechanism where hive reads data present in files present in hadoop file system. Here , pre-requisite is you should have basic knowledge of Hive.

STEP-1 Copy the Hadoop files of a particular Partition for that particular Hive object to your local server using get command. For example

hdfs dfs –get /hadoop-server-details/path-of-the-file/partition-column-name=value  /home/user1/

Here Assumption is file format of files that are mapped to hive object is not normal text file (say avro file) . So, it is recommended to copy the files first to your local server(Linux).

STEP-2 Once the file is in Linux Server, go to /home/user1/ path using cd command. Now, you want to replace one value to another value , for example  , you want to replace  Chicago to Los Angeles. So, by using sed command, you can achieve this.

sed -i ‘s/Chicago/Los Angeles/g’ filename.avro

Now, you have new file with different City. That is how you have new data set ready.

STEP-3 Now, you have  two different data set and hence , there are two scenarios . 1.To keep the existing partition for that particular hive object and copy new data-set to a different partition for that hive object. 2. To remove the existing data from the hadoop file system and copy the new dataset to  the same existing partition.

Scenario 1: So, first we need to create a different partition for the same object using mkdir command (in hdfs).

hdfs dfs –mkdir /Hadoop-path/file-name/date=2019-12-31

Once new partition is created, now you have to copy the new dataset to new Hadoop partition for that object using below command.

hdfs dfs –put /home/user1/filename.avro /Hadoop-path/file-name/date=2019-12-31

Now, we need to alter hive object as well to create that partition using below command.

alter table schema_name.hiveobject1 add partition (date=’2019-12-31′); Next step is to run msck repair command for that Object.

msck repair table hiveobject1;

Scenario 2: Here, you need to remove the data from existing partition first using below command.

hdfs dfs –rm  -r /Hadoop-path/file-name/date=2019-12-30/* Then copy the new data-set to the existing partition using the below command .

hdfs dfs –put  /home/user1/filename.avro /Hadoop-path/file-name/date=2019-12-30/

Run the msck repair command to sync the data.

msck repair table hiveobject1;

Related posts