add datasource in Data cleaner

In my last post on data cleaner where i mentioned the steps to install data cleaner and integrate with Pentaho, so today i will explain how to add data source in Data cleaner and what all types of data sources we have in data cleaner. Below are the data source which are available in Data cleaner.

  1. csv file.
  2. Excel file
  3. Access Database.
  4. SAS
  5. Dbase Database
  6. Text fixed file
  7. XML file
  8. JSON
  9. Salesforce
  10. MongoDB
  11. CouchDB
  12. HBase
  13. Oracle
  14. Sql Server
  15. Mysql
  16. Apache Hive

Below is the image for the same.

Now, when you click on any data source, say csv file, below screen will appear.

Here, I have considered EMP_DETAILS.csv file. Once you fill above details, it will look like below.

Click on Register Datastore. As soon as you click on this, it will appear on Datastore Management.See the below image for the same.

 

It is well understood that  whatever action we are doing  at UI level, it should reflect in some file as well. So, here is one surprise for you all, as i am doing this on windows Machine, one of the file which has all datastore details is not getting updated. The location of the file is

<PATH>\data-integration\plugins\kettle6-profiling-datacleaner\conf.xml

Here , you need to add EMP_DETAIL.csv as datasource in conf.xml file. Below is the content which you need to add in this file.

<csv-datastore name=”EMP_DETAILS.csv
description=”Example CSV-file with representing employee’ details”>
<filename>datastores/EMP_DETAILS.csv</filename>
<encoding>UTF8</encoding>
<quote-char>”</quote-char>
<separator-char>,</separator-char>
</csv-datastore>

So, these are the steps to add data source in Data cleaner.

 

Related posts