HI Abhilash,
I would like to understand the use-case in little more details in order to answer in a more meaningful way.
Nevertheless, I am trying to answer your question based on my limited understanding of your problem. Let me re-state the problem to express what I have understood so far: The data from various sources is coming to HDFS and you would want some part of it to move to MongoDB.
Here my first question to you is “Do we wish to just copy the entire data or a part of it?”
If we are just copying entire data, then this mongodb sink should suffice. Please follow these steps as mentioned on the home page:
- Clone the repository
- Install latest Maven and build source by ‘mvn package’
- Generate classpath by ‘mvn dependency:build-classpath’
- Append classpath in $FLUME_HOME/conf/flume-env.sh
- Add the sink definition according to Configuration
If you wish to send only partial data, then you might want to first process and then transfer. There can be done by either of the following ways:
- First process data using pig/hive/spark and then use flume to transfer data.
- Write map-reduce job or spark application to process and write the data to mongodb right away.
The second approach would require coding but would faster in execution.
Hope it helps.