Check Number of fields in csv file in Pentaho

Today, I will discuss about the “How to check Number of fields in csv file or any other file in Pentaho without using shell Scripting. Below is the code snippet for the same.

Here, we have used components like Text file input, Sample rows, Split fields to rows,Write to log. In Text File input, we have intentionally kept wrong delimiter, so that we will get only one field in the fields section. Please see how data looks like.

See the Delimiter (;) which is not meant for csv files.
only one field comes in fields section
Entire row is coming as one field

Now, comes the Sample rows,Using this component, we will fetch only one row. See the below Image for the same.

Selecting only one row.

Post this component, we will take one row in entire Transformation flow. Now, we will split the field to rows using “Split field to rows” Component. See the Below Image for the same. Here, I have checked the Option “Include rownum in output”.

Now, we will sort the data based on Row_Number(Descending Order). Hence, we will get top most row_number on the top. See the below Image for the same.

Now, Use the same component again which is sample rows and fetch first row which is the highest row number as we have sorted the data in descending order. See the execution logs for the same.

Now , its clear from the above logs that Number of fields in the csv file is 5. Hence we can compare this number with actual number which is mentioned in the use case document . If these numbers are equal, go ahead with loading the file into table and then applying transformation rules based on the use case requirements.

Related posts