Check duplicate record in Hive

Today, I will discuss about ” How to automate the process where in you can check entire row duplicate record in hive”. As I have mentioned in all Automation blogs, I will share the pseudo code.STEP1: In hive , use “desc table_name” , this command will give you column names along with datatype and data length. Store the output of this command in a file , say HIVE_TABLE_DDL.txt STEP2 : Read the file HIVE_TABLE_DDL.txt using “cat” command. cat HIVE_TABLE_DDL.txt | awk ‘{print $1}’ ORS=’,’ | sed ‘s/,$//’ * awk'{print $1}’…

Automate the existence of S3 files through shell scripting

Today, I will discuss about “How to automate the existence of files in S3 bucket through shell script”. Here , i will share pseudo code for the same in the form of steps.* create one config file which we will have below details with pipe delimiter.S3 bucket Name|project_folder_name|Relative path of sub folderTechie-1|ABC|prj1/UC1/ #### Here techie-1 is the bucket Name, the immediate folder to this bucket is ABC and project Use-case folder is prj1/UC1/ ####* This[config] file will have list of Use-case folder names for which you need to test the…