Hi Gaurav,
Let me try to rephrase the question in order to express my understanding: How do we generally push data into Cloud (specifically Kafka) from IOT devices?
There are two sides of the problem: Client Side or IOT Device side and Server Side
Client Side or IOT Device side
There could be many kinds of IOT devices. For the discussion sake, lets take example of Ardruino - Its a very popular cheap open source controller used in building prototypes or other IOT devices.
One the devices side there could be two mechanisms:
-
Arduino device pushes data - In this mechanism, the Arduino makes calls to server by the way of rest interface and passes the data of sensors connected to it via post or get method in an HTTP REST API Call. We will talk about server in a minute. You can make simple rest call by using HttpClient
This approach is good if you have internet conection on Arduino and you have only few devices.
-
Another machine polling Arduino - In this mechanism there is a separate machine which keeps polling or requesting data periodically from all the installed IOT devices. The advantages of this approach is that you do not need internet connection on all IOT devices. To get data from Arduino you will have to run aRest server on it: https://github.com/marcoschwartz/aREST
Check this examples: https://github.com/marcoschwartz/aREST#quick-test-wifi
Server Side
Now, coming to server side. We can either directly expose Kafka to the client or IOT device or we can build a very thin service which internally talk to Kafka. Alternatively, we can also use an amazon’s lambda service.
Directly Expose Kafka:
You directly expose a REST interface to Kafka using this project: https://github.com/confluentinc/kafka-rest
Though it is very tempting to expose Kafka because Kafka is well load balanced, I would not recommend this approach. If we more to some other nosql or queue, it will be difficult to migrate. Updating softwares on the IOT devices is hard.
Have a middle layer
A better approach is to build a light weight service using Node.js which internally talk to the storage (MongoDB, Cassandra, HDFS) or queue (Kafka). You can simply create a service in node.js with a very simple code like this:
var http = require('http');
http.createServer(function (req, res) {
res.writeHead(200, {'Content-Type': 'text/html'});
res.end('Hello World!');
}).listen(8080);
The third mechanism could be to use a cloud platform such as Google Compute engine, Amazon or Azure. If you are using Amazon, you could use AWS Lambda service. The Lambda service is very easy to scale up.
I hope this would help you.