BI tools + Spark, Hadoop and HIVE

Balakrishnan_K · October 15, 2017, 4:59pm

How can we use BI tools on top of Hadoop/ Spark to analyse the Big data? Also How can we achieve more from Hive use BI tools ?

mannharleen · October 18, 2017, 9:58am

Hello Bala,

Hive uses map reduce, which is typically focused on batch jobs.

If your BI tool is storing the data into its own datamart then you can use jdbc to connect to hive and pull the required data.

However, if your BI dashboard is using “direct query”, then hive is certainly not the approach. The time for the result to be presented on the bi tool will be too much! Seek to use other MPP platforms like tez or impala.

raviteja · April 1, 2018, 9:35pm

Hi @mannharleen,

That’s right,

Most of the times, this works if we need to access Big Data through Hive JDBC/ODBC drivers. If this data is at large scale i think it would take good amount of time in processing data in data marts. But the actual purpose for connecting BI tools to Hive is for leveraging Big Data scale processing.

Yes, Hive with MR engine would not be right approach & as suggested by you about MPP, Tez & Impala will make performance better. With recent enhancements in Hive engines like now it supports Spark engine (Shark), we can use “direct query” approach with Spark engine in Hive.

We have other better options like Spark JDBC/ODBC connectivity with BI tools instead of just Hive, so we can leverage Spark in-memory computing for direct query.

Please share your insights or if i’m missing anything here.

P.S: Just wanted to share new findings, on this interesting topic. So opened little bit old post.
If happened to find any benchmark results on this approach will share in this page.