Migration from one hdfs to other



Please share some information about steps that we could use for migration from one hdfs to another (say cloudera to another platform) and also how much time would it take to migrate from 1 hdfs to another with data volumes and details below.

Few details are below for big data prod:

50 TB data
20 node cluster (commodity h/w)
replication factor 3x


Hi @A_Arora,

You can use distcp for copying the files from one cluster to another cluster.

Hope this helps.



Hey @abhinavsingh,

Is it possible to use ‘distcp’ command for 50TB data? i don’t think it’s possible to do that kind of migration in production.


Hi @raviteja,

As per my understanding of distcp, it should work. If you know any other way of transferring the data then please share it :slight_smile:

Hope this helps.



Hi @A_Arora & @abhinavsingh,

Distcp uses MapReduce for inter/intra cluster data copy, it performs file-to-file copy.
Please refer below references:

Note: Please don’t use HDFS shell commands:cp, copyfromlocal, put, get, as they cause I/O bottlenecks for large files.

MapR & Google Cloud docs suggest to copy using Distcp.

You can use NiFi.