Spark streaming


#1

How can we listen the streaming data from any live platform like facebook,instagram,etc by using the spark streaming for more interactive hands-on practice.?


#2

Hi, Anubhav.

To get and analyze the data for any social websites, you need to design two basic components.

  1. Scrapper :- To read data from social media. Below are the basics functionality the scrapper do.
  1. Opening a socket, listening to and accepting connection requests from the Analyzer.

  2. Reading stream for example tweets or whatsapp msg or facebook from the Twitter streaming API/whatsapp API/Facebook API.

  3. Writing tweets to socket in JSON format.

The following do it –
s = socket.socket(‘localhost’, 9999)
s.bind((‘localhost’, 9999))
Function “socket.socket” creates a socket and returns a socket that is used. We generally use port 9999 for this socket:

  1. Then we establish a connection to the streaming API.
    auth = requests_oauthlib.OAuth1(consumer_key, consumer_secret,access_token, access_token_secret)
    You have to request APIs from the respective websites from where you want the data some are free and some are paid.

  2. Analyzer :-
    We use Spark’s streaming API to read data from socket, to pre-process the tweets/message, to cluster tweets when data arrive in a stream, and push the result out for visualization.
    Then you can create the various variables from the data and you can analyse it using Sparkcontext() object and apply various algorithms on it to fulfill your objective.
    For more details kindly refer to the below article on Twitter analysis.

All the best