RDBMS to MongoDB using Hadoop

abhilash · June 2, 2017, 10:12am

Hi,

I have data coming from different database like Oracle,SQLServer,MySQl etc,
I need to convert these data into MongoDB .

As per my understanding I can use Sqoop to import the RDBMS data to HDFS.

So can anyone Please let me know, how can I export these data from HDFS to MongoDB.

Thanks,

sgiri · May 25, 2017, 1:35pm

HI Abhilash,

I would like to understand the use-case in little more details in order to answer in a more meaningful way.

Nevertheless, I am trying to answer your question based on my limited understanding of your problem. Let me re-state the problem to express what I have understood so far: The data from various sources is coming to HDFS and you would want some part of it to move to MongoDB.

Here my first question to you is “Do we wish to just copy the entire data or a part of it?”

If we are just copying entire data, then this mongodb sink should suffice. Please follow these steps as mentioned on the home page:

Clone the repository
Install latest Maven and build source by ‘mvn package’
Generate classpath by ‘mvn dependency:build-classpath’
Append classpath in $FLUME_HOME/conf/flume-env.sh
Add the sink definition according to Configuration

If you wish to send only partial data, then you might want to first process and then transfer. There can be done by either of the following ways:

First process data using pig/hive/spark and then use flume to transfer data.
Write map-reduce job or spark application to process and write the data to mongodb right away.

The second approach would require coding but would faster in execution.

Hope it helps.

abhilash · May 30, 2017, 4:30am

Thanks a lot sandeep