How to use matplotlib in pyspark sql?


#1

Hello,
I want to display data by using matplotlib in pyspark sql.
I am fetching data from hive for analysis and now I want to display the same using bar/histogram.
Is there any way to do this? If yes please suggest.


#2

Follow this to access spark in Python 3 jupyter notebook: https://cloudxlab.com/blog/running-pyspark-jupyter-notebook/

Follow the standard tutorial to load data from Hive:
https://cloudxlab.com/assessment/displayslide/615/spark-sql-using-hive-tables?course_id=73&playlist_id=338

Once you have loaded the data from Hive, you can use the “take()” or collect() to bring data from DataFrame to in-memory and then use matplotlib on the in-memory data.