Automate the existence of S3 files through shell scripting

Today, I will discuss about “How to automate the existence of files in S3 bucket through shell script”. Here , i will share pseudo code for the same in the form of steps.* create one config file which we will have below details with pipe delimiter.S3 bucket Name|project_folder_name|Relative path of sub folderTechie-1|ABC|prj1/UC1/ #### Here techie-1 is the bucket Name, the immediate folder to this bucket is ABC and project Use-case folder is prj1/UC1/ ####* This[config] file will have list of Use-case folder names for which you need to test the…

features of Athena in AWS

Today, I will discuss about the Athena Database features in AWS. Below are the characteristics of the Athena .* Query Editor : This is the workspace where you can write your queries. * Saved Queries : This is the location where you can save your queries which are frequently used using “Save as” * History : It will have list of queries which are executed in the Query Editor section. * Athena logs : As mentioned in by previous blog, athena maintain logs in the S3 bucket as per the…

create table in Athena using CSV file

Today, I will discuss about “How to create table using csv file in Athena”.Please follow the below steps for the same.* Upload or transfer the csv file to required S3 location. * Create table using below syntax. create external table emp_details (EMPID int, EMPNAME string ) ROW FORMAT SERDE ‘org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe’ WITH SERDEPROPERTIES ( ‘serialization.format’ = ‘,’, ‘field.delim‘ = ‘,’ ) location ‘s3://techie-1/emp/’ TBLPROPERTIES ( “skip.header.line.count”=”1”) * Important to note here that if you have a file which has header , then you need to skip the header .For this, we…

create database in Athena

Today , I will discuss about “How to create Database in Athena.Follow the below steps to achieve the same.* Create IAM policy which gives access to S3 bucket for Athena Logs to the required user.* While creating that IAM policy, you need to put the below content in the JSON tab.* Here, in below JSON, you need to change the bucket Name. * If you see clearly , in JSON both Glue and Athena has mentioned , so if you don’t want to give access to Glue DB, then you…

Equivalent of rollup in hive

Today, i will discuss about ” equivalent of rollup in hive”. There are scenarios where we need to replace the keyword rollup from the queries as it might not be working as expected. To put it differently, we might need to transform the oracle/sqlserver query to hive where rollup doesn’t give same results which we got in oracle. In an employee table. I have EMP_ID,DEPT_NAME,SAL_MONTH,SALARY columns. Query-1 with rollup :select EMP_ID,DEPT_NAME,SAL_MONTH,sum(SALARY) from EMP group by EMP_ID,DEPT_NAME ,rollup( SAL_MONTH ); Equivalent Query-1 without rollup using group by clause : select EMP_ID,DEPT_NAME,null,sum(SALARY)…

Date functions in Hive

Today, I will share list of queries related to “date” logic in which developers generally face issues during Hive query execution. * Query to get the sysdate select from_unixtime(unix_timestamp(current_timestamp())) * Query to get the yesterday’s date. Here 86400 is the number of seconds in a day(24 Hrs * 60 Minutes(in a hour)* 60 Seconds(in a minute)) select to_date(from_unixtime(unix_timestamp(current_timestamp(),’yyyy-MM-dd’)-86400*1));* Substract the x number of days from the particular date. select to_date(from_unixtime(unix_timestamp(current_timestamp(),’yyyy-MM-dd’)-86400*x)); * Convert the sysdate to YYYY-MM-DD format select to_date(from_unixtime(unix_timestamp(current_timestamp(),’yyyy- MM-dd’)));* Convert the sysdate to YYYYMMDD format select regexp_replace(to_date(from_unixtime(unix_timestamp(current_timestamp (),’yyyy-MM-dd’))),’-‘,”);* Convert…

Run hive queries online

Today, I will discuss about “How to run hive queries online”. Below are the steps for the same.* Go to below link https://demo.gethue.com/hue/accounts/login?next=/ * Enter the username and password as “demo”.* Click on “Sign in” Button. Below Screen will appear. As you can see, many users have created their databases. * In order to create your own database, Click on “+” icon.* Fill Database Name and description . * Click on submit button. The output screen will appear on the top-right section.So, we have database ready and name is “hive_testing”…

regex in Pentaho

Today , I will discuss about the Regular expression in Pentaho.Below are regex which can be used in Pentaho. There are many more as well. I worked on below Regex , so, I mentioned only those which I have used in the code. The values having only digits : ^[0-9]+$ The values having only lowercase characters : ^[a-z]+$ The values having only uppercase characters : ^[A-Z]+$ The values having only uppercase & lowercase characters : ^[a-z|A-Z]+$ The value having list of vowels characters ^[AEIOU]+$ The value having only these characters…

unzip component in Pentaho

Today, I will discuss about the “unzip” component in Pentaho. It will unzip the files present inside the zipped folder and archive the zipped file to another directory. See the Values which I filled in the unzip component.1. Zip file name : Mention the folder where the zipped file is present.2. Source Wildcard(Exp) : Mention the filename pattern of the zipped file. Here , I mentioned all files with zip extension.3. Target Directory : Give the path where you want to copy the unzipped files.4. Create folder : Check the…