Sqoop related question


How do you do parallel processing in SQOOP?


As Sqoop job to migrate data SQL database to HDFS it runs on the top of MR engine and by default, it split the job into three partitions( if your SQL table was defined with primary key).

But still, if you want to create more than three partitions just you need to add below parameter along with your query.

-m <number or mappers/partition)

Hope it will help you

Thank you Mahesh for good answer.

I would like to add that the default mappers are 4 as per their documentation.

Yes Sandeep my mistake, Sqoop creates 4 mappers by default. Thanks to correct :slight_smile: