MERGE THE MULTIPLE CSV FILES INTO ONE FILE IN PENTAHO

Today, I will discuss about how to merge the multiple csv files into one csv file in Pentaho. Below is the image of the ETL code for the same.

First, create transformation which will load the csv file names into variable.  Below is the code for the same.

In Get file names component, use wildcard expression to fetch all CSV files of particular pattern, and click on show filenames which will show file names along with absolute path.

Here, I have considered a scenario where employee salary generated in csv file on monthly basis. Now, we need to merge these files into one file so that further analysis can be done.

Second step is use “Write to file” component which will create the merged file with only header in it. See the  below image for the same.

Need of this step is when you will merge the multiple files into one using text file output component , then it will not write the header to the file.Hence Write to file is required.

Third step is to write the data of the files one by one into a single file. Start with transformation , check the box “Execute every input row”. See the below image for the same.

See the below code designed inside the transformation.

When we use csv file input , we give path of the file and wildcard to recognize the exact file , but here we use “Data from the previous step”. See the SS for the same.

Another point to remember, you should know the number of fields in the source file, same no of fields need to be mentioned in the csv file input. See the image for the same.

When you run the main job, it will merge the three files into one file.

Related posts