Structured Spark Streaming + HBase Integration

Hi,

I’m doing a structured spark streaming of the kafka ingested messages and storing the data in hbase post processing. 'm running this job on CLOUDXLab cluster. The issue that is popping up is,

ERROR ConnectionManager$HConnectionImplementation: The node /hbase is not in ZooKeeper. It should have been written by the master. Check the valu
e configured in ‘zookeeper.znode.parent’. There could be a mismatch with the one configured in the master.

I tried passing the hbase-site.xml in the spark-submit, but no luck. The hbase-site.xml has the property “zookeeper.znode.parent”, which is “/hbase-unsecure”

My spark-submit parameters are,
spark-submit
–class CountryCountStreaming
–master yarn-client
–conf spark.ui.port=4926
–jars $(echo /home/venkateshramanpc5546/external_jars/*.jar | tr ’ ’ ‘,’)
–packages com.hortonworks:shc-core:1.1.1-2.1-s_2.11
–repositories http://repo.hortonworks.com/content/groups/public/
–files /usr/hdp/current/hbase-client/conf/hbase-site.xml
kafkasparkstreamingdemo_2.11-0.1.jar cloudxlab /tmp/venkatesh/retail_schema/Retail_Logs.json /tmp/venkatesh/retail_checkpoint

The stack version in the cluster is given below:

Hadoop 2.7.3
HBase 1.1.2
Zookeeper 3.4.6
Kafka 0.10.1
Spark 2.1.1

Please find the build.sbt and the scala classes attached for your reference.

Kindly let me know if there is any hbase configuration (zookeeper quorum, zookeeper clientport, zookeeper znode parent) which we can set in the step where we are writing data to a table, which is,

df.write.
options(Map(HBaseTableCatalog.tableCatalog -> hBaseCatalog,
HBaseTableCatalog.newTable -> “4”)).
format(defaultFormat).
save()

There is a closed issue from Hortonworks, stating the issue is solved by setting the hbase-site.xml in the spark-submit or by copying it to spark conf folder. Can we have that copied to conf folder, /usr/hdp/current/spark2-client . The issue is given below,

This is being discussed here: Spark Submit Issues - Interacting with HBase