Migration from one hdfs to other

Hi,

Please share some information about steps that we could use for migration from one hdfs to another (say cloudera to another platform) and also how much time would it take to migrate from 1 hdfs to another with data volumes and details below.

Few details are below for big data prod:

50 TB data
20 node cluster (commodity h/w)
replication factor 3x

Hi @A_Arora,

You can use distcp for copying the files from one cluster to another cluster.

Hope this helps.

Thanks

Hey @abhinavsingh,

Is it possible to use ‘distcp’ command for 50TB data? i don’t think it’s possible to do that kind of migration in production.

Hi @raviteja,

As per my understanding of distcp, it should work. If you know any other way of transferring the data then please share it :slight_smile:

Hope this helps.

Thanks

1 Like

Hi @A_Arora & @abhinavsingh,

Distcp uses MapReduce for inter/intra cluster data copy, it performs file-to-file copy.
Please refer below references:
http://hadoop.apache.org/docs/stable/hadoop-distcp/DistCp.html
https://www.cloudera.com/documentation/enterprise/5-5-x/topics/cdh_admin_distcp_data_cluster_migrate.html

Note: Please don’t use HDFS shell commands:cp, copyfromlocal, put, get, as they cause I/O bottlenecks for large files.

MapR & Google Cloud docs suggest to copy using Distcp.

Alternatives:
You can use NiFi.