SPLIT FIELDS COMPONENT IN PENTAHO

Today, i will discuss about the “Split Fields” component in Pentaho.   Split Fields component split the value of a variable into multiple variables based on delimiter.For example, consider  a variable var1 having value a;b;c. Now i need to split into three different values a,b and c. In order to achieve this, use split Fields component in Pentaho with delimiter as “;”. See the below image for the same. Use Generate rows  to generate one variable with value “a;b;c”. Then use the split Field component with delimiter as “;”. See the…

S3 FILE OUTPUT IN PENTAHO

Today i will share my experience on S3 file output component in Pentaho. This component is used when you want to create a file(with data in it) in S3 bucket. Below is the ETL code for the same. In above transformation , CSV file input and S3 file output component are used. In this code, I am just copying the data from a file which is present in my local Machine  to the S3 bucket(s3-techie as mentioned in my last blog). In S3 output file, mention the values for S3…

S3 FILE INPUT IN PENTAHO

Today i will share my experience how to use S3 file input component in Pentaho. Before we start and discuss about the ETL code, you should have S3 Access key and S3 secret key handy with you in order to connect with Amazon S3 bucket. Secondly, If you want to test the below ETL code and you are not using S3 bucket for file storage, then you can create your own Amazon account(Free for one year with limited features and limited storage). Below is the ETL code regarding “How to…

MERGE THE MULTIPLE CSV FILES INTO ONE FILE IN PENTAHO

Today, I will discuss about how to merge the multiple csv files into one csv file in Pentaho. Below is the image of the ETL code for the same. First, create transformation which will load the csv file names into variable.  Below is the code for the same. In Get file names component, use wildcard expression to fetch all CSV files of particular pattern, and click on show filenames which will show file names along with absolute path. Here, I have considered a scenario where employee salary generated in csv…

execute the job using data cleaner in Pentaho

In my last post, I have explained how to create different data sources in data cleaner. Today, i will use the same data source which is csv input file and design a job in the data cleaner tool.  Upon completion of job, we will use data cleaner component in Pentaho and execute the same job using Pentaho. First, once you open data cleaner tool using Pentaho which i have mentioned in my previous post, click on New-> Build New Job. See the below SS for the same. Then , select…

CHANGE THE USER INTERFACE OF PENTAHO

Today, I will discuss about  how to change the UI (User Interface) of Pentaho. Below is the Welcome page of Pentaho Data Integration. See the Green highlighted part. Now , in order to change the content and image of Welcome page. You need to change the index.html. Location of this file is <path>\data-integration\docs\English\welcome. I have changed the below lines of index.html. <div class=”header-navigation”> <div class=”header-navigation-item”>WELCOME TO PDI</div> <div class=”header-navigation-item”>MEET PDI FAMILY</div> <div class=”header-navigation-item”>CREDITS</div> <div class=”header-navigation-item”>WHY ENTERPRISE EDITION</div> <div class=”clear”></div> </div>   <div class=”headerContents”> <h1 class=”large lineheight45″>How to get Most<br>From Pentaho</h1>…

add datasource in Data cleaner

In my last post on data cleaner where i mentioned the steps to install data cleaner and integrate with Pentaho, so today i will explain how to add data source in Data cleaner and what all types of data sources we have in data cleaner. Below are the data source which are available in Data cleaner. csv file. Excel file Access Database. SAS Dbase Database Text fixed file XML file JSON Salesforce MongoDB CouchDB HBase Oracle Sql Server Mysql Apache Hive Below is the image for the same. Now, when…

Run the Pentaho Transformation using REST Service

In my last Post  https://www.allabouttechnologies.co.in/pentaho/run-transformation-as-web-service-using-carte-in-pentaho/ I have mentioned how to run transformation as web service  using carte. Today, we will use the same URL to trigger the transformation but using soapUI tool. I downloaded the tool from the  link https://www.soapui.org/downloads/soapui.html Once the installation is completed, Open the tool, UI will look like below. Create a REST project by clicking on REST button. See the below image for the same. Once you click on REST, it will ask for URL, copy the URL http://127.0.0.1:8080/kettle/executeTrans/?trans=<repository_path>/credit-carte.ktr to that text box. See the below image for…

run transformation as web service using carte in Pentaho

Today , i will discuss about how to run transformation or job as a webservice using carte in your local Machine. First we have to create the configuration.xml file inside data-integration folder where carte.bat or carte.sh file present. Below is the content of the file. <slave_config> <slaveserver> <name>carte</name> <hostname>localhost</hostname> <port>8080</port> </slaveserver> <max_log_lines>10000</max_log_lines> <max_log_timeout_minutes>1440</max_log_timeout_minutes> <object_timeout_minutes>1440</object_timeout_minutes> </slave_config> Then Go to data-integration folder  and execute the below command in command prompt. Carte.bat configuration.xml Below lines on command prompt will give you an indication that Carte is up and running in your local Machine. Carte…

interview Questions on Pentaho

Today, i will share few set of interview questions on Pentaho. Below are the Questions. How you will implement SCD type 0 ,SCD type 1, SCD type 2 in pentaho.  Difference between arguments and variables What are the components present in transformation, name at least 10 components. Have you ever implemented any plugin using Java in Pentaho. If I want to run 10 jobs in parallel through Shell Script, how should I do it. What is factless fact table , give practical examples What is conformed dimension and degenerate dimensions…