HDFS Question - Namenode vs Datanode


[Copied from internal discussion]

Can you please clarify the answer to the question below?

I read that an HDFS cluster consists of many machines; one of these machines is designated as namenode and other machines act as datanodes.

Namenodes store metadata and datanodes store actual data. So I want to know whether they are services(programs) or actual machines.

Question: What best describes namenode or datanode?

Answer: Service that runs in background and listens on a network port



[Further question to above]
Also, just to clarify, are we saying that when files are created, though they will have no content, both the namenode and datanodes will be impacted but when folders are created, there could be no files so no data, hence only namenode will be impacted?

When we create a file in HDFS which will be impacted?

Both namenode and datanode

When we create a folder which will be impacted?

Namenode only


Yes that’s right.

Namdenode and datanodes are services. You can install both of them on the same machine or different machines.

Yes, that’s right.


Good question. I think even for empty file it would creating a file on datanode called block which is empty. This I probably need to further dig.