I have question on Writing files on HDFS.
My Understanding is :
- Client or Driver(ex:hdfs dfs) will also be on the data node.
- And NameNode provides required metadata information to place the blocks with replication in datanodes to the client.
- Client will split the file into chunks of blocks in the datanode where file is available.
My Question is
if the datafile is very huge datanode should have required resources(CPU and RAM) to split and process the data from Local File System of Datanode to HDFS of other Datanodes? .
How do we deal with a very large file which is located in some other server or sharepoint.
How long will it take to split the file(eg 100 gb file) and place the blocks in datanodes in general senarios. what happens if the datanode does not enough resources to process the file ( Generally these are on commodity hardware)