Architecture Setup

Hey Guys!

I want to set up my own Spark architecture using 4 VMs. On one of the VMs I want to install Spark, on another I want to install Kafka, on one Cassandra and on one HDFS. How do I set this up? I don’t want use the Hortonworks/ Cloudera sandboxes. I was wondering how would the big companies set up their big data analytics platform? Any links/suggestions are highly appreciated!

Hi @mzeesoofally,

Yes you can setup but in your own machine or Cloud environment?

Spark works better in distributed mode.

You can install Cassandra & Kafka on different VM’s.

Since you are not interested in Sandboxes, i would suggest to go with Plain vanilla setup. i.e. Download required binaries from official repositories with perfect dependencies in VM’s & proceed with installation.

From startups to Big corporate companies would never encourage you to download plain vanilla setup as it would waste lot of time & valuable resources. So they always prefer distributions like Cloudera/Hortonworks/MapR. Recently all cloud platforms has built in Big Data components as service for fast agile development & reducing efforts on DevOps.

You cloud find lot of case studies about Big Data infrastructure in respective distributions websites.

@raviteja I want to put it up on my own computer. Would you have any documentation to do this?

@mzeesoofally,

Please refer below video:
https://www.youtube.com/watch?v=ql95CrGob48

P.S: Since you are asking for Ubuntu installation, i’m sharing external video references