Mysql To HDFS : How To Import Data Using Sqoop

In this post, Hadoop developers will make you learn about the procedure to perform Sqoop Installation and import data from MySql to HDFS. Make sure you understand every point discussed here.

If you have any doubt, ask developers. They will respond to your query soon. Since last few years most of the organizations have made a decision to use the BigData as a component to perform operations on the data.

As the data is growing exponentially it has become essential to transform the Traditional applications to BigData compitable application.

The traditional applications uses the relational database and RDBMS for interacting with the database, In order to analyse the database using Bigdata components, it is required that the data should be available to the components like HDFS, Hbase, Hive, Pig.

In this article I am going to explain one of the BigData component i.e Sqoop, which is used for Import data from RDBMS to HDFS.

This article is divided into 2 parts,

Installation of Sqoop
Data Import from RDMBS(MySql) to HDFS.

Installation of Sqoop

Testing Environment

This tutorial has been tested using following environment.

OS : Ubuntu Linux(14.04 LTS) – 64bit
Hadoop : Hadoop-2.2.0
Prerequisites : Oracle Java 6 or newer.
Sqoop : sqoop-1.4.4.bin__hadoop-2.0.4-alpha

Prerequest

Hadoop is required component for installing the Sqoop, so make sure that your hadoop is installed properly and it is in working condition.

Download

Download the Hadoop compitablesqoop version you need to check for the Hadoop-Sqoop compatibility.I have downloaded Sqoop sqoop-1.4.4.bin__hadoop-2.0.4-alpha from,
http://archive.apache.org/dist/sqoop/1.4.4/sqoop-1.4.4.bin__hadoop-2.0.4-alpha.tar.gz

After downloading, extract the package using following command in terminal. I am assuming that the package was downloaded and saved in your Downloads folder.

tar -xvf ~/Downloads/sqoop-1.4.4.bin__hadoop-2.0.4-alpha.tar.gz

In next step, create a folder for Java JDK files and folder using following command.

sudomkdir -p /usr/local/sqoop/

In next step, move all the files and folders to new location by running following command.

sudo mv /usr/local/sqoop/sqoop-1.4.4.bin__hadoop-2.0.4-alpha

In next step, export SQOOP_HOME and PATH variable by adding following lines to the end of $HOME/.bashrc file of user. If you use shell other than bash, you should update its appropriate files instead of .bashrc.

I used gksugedit $HOME/.bashrccommand to edit file and appended following lines.

#Sqoop-2.0.4 PATH
export SQOOP_HOME=/usr/local/sqoop/sqoop-1.4.4.bin__hadoop-2.0.4-alpha
export PATH=$PATH:$SQOOP_HOME/bin

Edit and update sqoop-env.sh file in $SQOOP_HOME/confdirectory and update below path,

#Set path to where bin/hadoop is available
export HADOOP_COMMON_HOME=/usr/local/hadoop/hadoop-2.2.0

#Set path to where hadoop-*-core.jar is available
export HADOOP_MAPRED_HOME=/usr/local/hadoop/hadoop-2.2.0
my sqoop-env.sh looks like,

Now the configuration is completed, check the configuration using below command.

Sqoop-Version

It should give the output like below image,

Now the installation part is completed it is the time to import some data from RDBMS.

Import Data From RDBMS(MySql) To HDFS

In this portion of the article we will be importing the data from an RDBMS to HDFS, here in this case I have used MySql DB data as Source and HDFS as Sink.

I have a simple table of a Retail Store where I have stored the transaction related data and we are interested to get the data from that table to HDFS.

Here is my sql Spool file for creating and saving the sample data that we are going to use for this article.

Spool.sql

So my data looks like below image,

now as the data is prepared and ready in the database table we need to use below command for importing the data,
The syntax for the Import command looks like,

sqoop import –connect jdbc:mysql://localhost/databasename –username $USER_NAME –password $PASSWORD$ –table tablename –m 1

sqoop import –connect jdbc:mysql://localhost/retail_db –username root –password root –table retail_master –m 1

So once the command is fired, it will begin the MapReduce job for Bulk import and at the end of the execution you can find the datafile present in HDFS,

My final output file in HDFS looks like,

Now, your hadoop-sqoop is running and you can import as well as export data using Sqoop.
Follow every instruction shared by proficient hadoop developers in this post. Once you understand the procedure to perform Sqoop Installation and import data from MySql to HDFS, perform it and share your experience with the developers.

This article has been written by Ethan Millar who are working with Aegis Softtech as a technical writer since more than 5 years. The main object to write this article is to Perform Sqoop installation And Import Data from Mysql to HDFS.