Data Security in Big Data

How is Data Security taken care of or related to Big Data.

There are many ways to look at this question. Let me take a step back and explain in details.

What is data security?

Only the authorised users should be able to view and modify the data.

This can be ensure in the following way

1. Role & Permissions

Hadoop provides the basic roles and permission on top of the local linux system’s users. By default, HDFS uses the authentication of Linux system and user can change the permissions of the files in HDFS as per their wish using hadoop fs -chmod.

Users can also go for sophisticated tools like Apache Ranger for detailed authentication and authorization.

2. Encryption

One can encrypt their files either before uploading to HDFS or can also encrypt using streaming or map-reduce jobs.

3. System Security

Since there are multiple network services that are run as part of Hadoop (Namenode, Datanodes, Resource manager, Node managers, Hbase masters and of course zookeepers), the attack surface is really high. We can not turn off these services because they are the building blocks of Hadoop. These services need to communicate with each other.

So, in order to minimize the attach surface, you would have to create two networks one LAN and other WAN. These services should be running and interacting with each other via LAN and the users may login to these services over WAN.

I hope this helps you understand the security better.

2 Likes

Thanks for the comprehensive answer.

1 Like