Avro File Input In Pentaho

Today, I will discussing about the Avro input component in Pentaho.

Avro uses the concept of serialization and De-serialization.

Serialization means processing the data into binary format. Its very clear that if we have data in binary format ,its unreadable and hence very effective way to transfer over the network. Therefore, many Organization are adopting this technique due to data security concerns.

Deserialization means convert the binary formatted data into a readable form. Now the question comes how binary data is deserialized. Here , comes the concept of Schema file. The Schema file for Avro source file is AVSC, means Schema file will be “Schema.avsc” that contains the definition of the binary file which is in format “file.avro”.

Serialization and Deserialization is taken care by Pentaho Itself. So, in order to serialize the data, we have to have use Avro Output Component , whereas for deserialization , we have to use Avro Input Component. In this blog, we will discuss about the Avro Input Component.

In Avro Input Component, we need two files one is avro file and second is avsc file.

In Avro Input Component, you need to mention two files as discussed above. See the below Images for the Same.

Avsc Schema fields
Avsc and Avro file

Click On Preview Data ,put value as 10 rows.

Avro file Data with 10 Sample Records

Below are the links for the input and Schema files.

https://github.com/Karan-Arora-13/technicalstuff.git/

Related posts