Learning Dataframes using pyspark in Jupyter

Hi Friends,

i have put together a jupyter notebook to learn the DataFrames in pyspark in our git repository inside the folder spark/python.

Please follow these steps to get started with it.

  1. Login to console using either WebConsole, Jupyter terminal (New -> Terminal) or ssh/putty.
  2. Clone the repository: https://github.com/cloudxlab/bigdata.git
  3. If you had already cloned, update it using: cd bigdata; git pull
  4. Now, go to Jupyter and navigate to spark/python
  5. Open the notebook with name: pyspark_dataframe_jupyter.ipynb

If you are not using cloudxlab and just want to go thru notebook, you can see it here.