How should I start learning Big Data?

Hi CloudxLab team,

I’ve around 10 years of experience in developing mobile and web applications in Python, Rails, Java and Android. I’m looking to switch to Big Data domain?

How should I start learning Big Data?

Hi,

This is one of my favorite question :slight_smile:

I’ve written an answer to the same question on Quora some time back.

You can learn Hadoop and Big Data by following these steps

Step 0 - Learn the basics first

Before starting learning Hadoop, it will be good if you have experience with any programming language. You should know your way around basic *nix commands and have basic knowledge of SQL.

We’ve one of the popular Linux basics course on CloudxLab. You can enroll to the course for free here. The course name is Linux Basics for Big Data. It comprises of theory, questions and hands-on so that you will learn by doing practice.

I will recommend learning Python if you are not hands-on with any programming language. You can also get your hands dirty with Java. Though Java is not mandatory for learning Hadoop, knowledge of Java will give you an edge in certain scenarios like configuring advanced options in MapReduce. MapReduce is a programming paradigm for writing your logic in the form of mapper and reducer functions.

Step 1 - Go through free online resources

Those that have a lot of self-learning drive can pick up Hadoop on their own by using excellent online resources listed below. For others who prefer instructor-led courses, there are good courses online as well.

Free resources for self-learners

These online resources will help you in getting familiar with Hadoop architecture, components, and other related Big Data technologies

Step 2 - Practice and practice more on Hadoop cluster

The biggest challenge faced by any Hadoop learner is practice. More you practice, better hands-on you get with Hadoop and related Big Data technologies. Generally, for practicing Hadoop, learners download and setup a virtual machine provided by major Hadoop vendors like Hortonworks and Cloudera. Practicing Hadoop on a virtual machine will not give you a real-world experience. Also downloading, installing and configuring virtual machine is a painful process. Few of the downsides of virtual machines are

  • The VMs are huge in size, for example, HortonWorks’ VM is 9 GB.
  • You might have to upgrade your RAM to 8 GB.
  • Some BIOS don’t allow virtualization. You might have to change BIOS settings.
  • Some machines such as office desktops/laptops may not allow installations.

I too faced the same problem of practicing when I was learning Hadoop. To solve these issues of a virtual machine and to provide real-world experience to Hadoop learners, we have set up an online cluster named CloudxLab. CloudxLab is a virtual cloud-based cluster for practicing AI, Machine Learning, Deep Learning, Big Data technologies.

Practice and write code while going through above free resources. Practicing will help you in learning Hadoop faster and properly.

Hope this helps. Happy learning!

2 Likes

Thanks Abhinav for a detailed reply. It is really helpful for a newbie in Big Data like me.

I know Python and Linux. Let me enrol in your free Linux course first and then I’ll move forward as per your suggestions.

A quick question, how much time it will take to learn Hadoop and Spark? I can see that you have a self-paced course on Hadoop and Spark. Can you please let me know the details of it?

1 Like

Hi,

That sounds like a plan :+1:

You can find details of self-laced course on Hadoop and Spark in the similar question

It should take you 2-3 months to complete the course. The course will give you a head start in working on Big Data projects. More you work on real-life and live projects, better you will become in these technologies.

1 Like