How to read IOT devices data

Hello All,
Need inputs on how to read IOT devices into Kafka. Do we need some API from device manufacturer to be able to poll data? Do we need to connect to device data source (back end)?
Basically I want to understand how exactly it works with real world scenarios where we have IOT devices to read data from. If we have any such utility in Kafka or some method, do share your thoughts on that too. your pointers will help me move forward.

Thanks.
Gaurav

Hi Gaurav,

Let me try to rephrase the question in order to express my understanding: How do we generally push data into Cloud (specifically Kafka) from IOT devices?

There are two sides of the problem: Client Side or IOT Device side and Server Side

Client Side or IOT Device side
There could be many kinds of IOT devices. For the discussion sake, lets take example of Ardruino - Its a very popular cheap open source controller used in building prototypes or other IOT devices.

One the devices side there could be two mechanisms:

  1. Arduino device pushes data - In this mechanism, the Arduino makes calls to server by the way of rest interface and passes the data of sensors connected to it via post or get method in an HTTP REST API Call. We will talk about server in a minute. You can make simple rest call by using HttpClient
    This approach is good if you have internet conection on Arduino and you have only few devices.

  2. Another machine polling Arduino - In this mechanism there is a separate machine which keeps polling or requesting data periodically from all the installed IOT devices. The advantages of this approach is that you do not need internet connection on all IOT devices. To get data from Arduino you will have to run aRest server on it: https://github.com/marcoschwartz/aREST

Check this examples: https://github.com/marcoschwartz/aREST#quick-test-wifi

Server Side
Now, coming to server side. We can either directly expose Kafka to the client or IOT device or we can build a very thin service which internally talk to Kafka. Alternatively, we can also use an amazon’s lambda service.

Directly Expose Kafka:
You directly expose a REST interface to Kafka using this project: https://github.com/confluentinc/kafka-rest
Though it is very tempting to expose Kafka because Kafka is well load balanced, I would not recommend this approach. If we more to some other nosql or queue, it will be difficult to migrate. Updating softwares on the IOT devices is hard.

Have a middle layer
A better approach is to build a light weight service using Node.js which internally talk to the storage (MongoDB, Cassandra, HDFS) or queue (Kafka). You can simply create a service in node.js with a very simple code like this:

var http = require('http');

http.createServer(function (req, res) {
    res.writeHead(200, {'Content-Type': 'text/html'});
    res.end('Hello World!');
}).listen(8080);

The third mechanism could be to use a cloud platform such as Google Compute engine, Amazon or Azure. If you are using Amazon, you could use AWS Lambda service. The Lambda service is very easy to scale up.

I hope this would help you.

Hey Sandeep, This is awesome information. Really helps. It would take some time for me to understand different approaches you have mentioned as I have not done much of such kind of work. I will go through them all in detail and strengthen my understanding on this topic. Once again thanks a lot. I will get back to you for any question.

Great! Keep us informed.