Today, I will discuss about Athena APIs which can be used in automation using shell scripting to fetch the Table data to perform validations like count, date, sum (aggregate functions), duplicate .I will explain the APIs using count validation in Athena.
Get the Total records Details of the Athena tables using Athena APIs.
STEP 1 : Use the below Command to get the query execution id for count Query of particular Table.
API Used : start-query-executionaws athena start-query-execution --query-string "SELECT count(*) FROM $S3-Schema.$S3-Table" | grep [a-z] |awk -F ":" '{print $2}'|sed -e 's/"//g'|sed -e 's/ //g'
Store this value into a variable (say var_1). The above command save this result of count query into memory and gives you one query execution id.
STEP 2 : Give a sleep of 10 seconds using “sleep 10” in the code. This sleep is required as above query may take some time for execution.
STEP 3 : To get the count based on execution id which is generated as per above command, use the below code.
API Used : get-query-resultsaws athena get-query-results --query-execution-id "${var_1}" --region
region_id | grep "VarCharValue" | tail -1 | awk -F ":" '{print $2}'|sed -e 's/"//g'
The above commands gives count value and you can store this in a variable.
Here, you need to take care of aws-region-id . Example for region_id is us-east-1. Set the region-id accordingly.
Takeaway from this blog
1. Change the query part to do other validations on Athena tables.
2. Set the Sleep value as per the validation type like sum,count,null,sum,data.
3. Set the region-id as per your project’s AWS region.
4. sed, awk, grep commands will change again based on type of validation you want to perform.