Hi Friends,
i have put together a jupyter notebook to learn the DataFrames in pyspark in our git repository inside the folder spark/python.
Please follow these steps to get started with it.
- Login to console using either WebConsole, Jupyter terminal (New -> Terminal) or ssh/putty.
- Clone the repository:
https://github.com/cloudxlab/bigdata.git
- If you had already cloned, update it using: cd bigdata; git pull
- Now, go to Jupyter and navigate to spark/python
- Open the notebook with name: pyspark_dataframe_jupyter.ipynb
If you are not using cloudxlab and just want to go thru notebook, you can see it here.