Hello ,
What’s distribution cluster in spark? does It related to any hardware configuration or nodes? I’m new in Big Data Technology.
Thanks
Devendra Shukla
Hello ,
What’s distribution cluster in spark? does It related to any hardware configuration or nodes? I’m new in Big Data Technology.
Thanks
Devendra Shukla
Hi Devendra,
The Big Data computation is all about distributed computing. Distributed Computing means utilizing multiple computers to solve a problem.
Apache Spark helps you solve big data problems using multiple computers. Apache Spark lets you define constructs that distribute data on multiple machines such as an RDD or a data frame.
And we operate on an RDD or dataframe using the libraries provided or create custom functionalities.
I hope I have been able to answer your question.
Hi Sandeep,
Thank you very much for your reply and this answer really solves my problem.
One last thing, what’s the hardware configuration or nodes size or cluster size of a server which we are using for solving big data problems I know it depends on the project but generally what we use. I’m giving the interview in big data technology so generally, the interviewer asked me this questions.
Thanks
Devendra Shukla
At CloudxLab, we use around 7 or 8 machines.
At a previous job, I used close to 800 machines.
So, it all depends on projects. Never say we used a single machine in production.