Learning Dataframes using pyspark in Jupyter

sgiri · August 26, 2019, 6:08pm

Hi Friends,

i have put together a jupyter notebook to learn the DataFrames in pyspark in our git repository inside the folder spark/python.

Please follow these steps to get started with it.

Login to console using either WebConsole, Jupyter terminal (New -> Terminal) or ssh/putty.
Clone the repository: https://github.com/cloudxlab/bigdata.git
If you had already cloned, update it using: cd bigdata; git pull
Now, go to Jupyter and navigate to spark/python
Open the notebook with name: pyspark_dataframe_jupyter.ipynb

If you are not using cloudxlab and just want to go thru notebook, you can see it here.