Distribution of Cluster

dev9450 · November 17, 2017, 5:20am

Hello ,

What’s distribution cluster in spark? does It related to any hardware configuration or nodes? I’m new in Big Data Technology.

Thanks
Devendra Shukla

sgiri · November 17, 2017, 12:42pm

Hi Devendra,

The Big Data computation is all about distributed computing. Distributed Computing means utilizing multiple computers to solve a problem.

Apache Spark helps you solve big data problems using multiple computers. Apache Spark lets you define constructs that distribute data on multiple machines such as an RDD or a data frame.

And we operate on an RDD or dataframe using the libraries provided or create custom functionalities.

I hope I have been able to answer your question.

dev9450 · November 17, 2017, 3:24pm

Hi Sandeep,

Thank you very much for your reply and this answer really solves my problem.

One last thing, what’s the hardware configuration or nodes size or cluster size of a server which we are using for solving big data problems I know it depends on the project but generally what we use. I’m giving the interview in big data technology so generally, the interviewer asked me this questions.

Thanks
Devendra Shukla

sgiri · November 29, 2017, 5:58am

At CloudxLab, we use around 7 or 8 machines.

At a previous job, I used close to 800 machines.

So, it all depends on projects. Never say we used a single machine in production.