How do i import data from HDFS to Jupyter to work on Machine learning

vamsi_patnam · April 5, 2018, 2:00pm

hi, i have question. could any one help me please. My question is “how do i import my data from HDFS to Jupyer”
currently, i’m working on a machine learning project so i wanted to import the data from HDFS to jupyter.
df = pd.read_csv("/user/myid/diabetes.csv") i tried this but not working. please help me

sgiri · April 5, 2018, 2:28pm

You could use hadoop fs -copyToLocal on the console to copy data from HDFS to the linux console (same as what is visible in jupyter).

vamsi_patnam · April 5, 2018, 2:43pm

Thanks for your reply sgiri, may i know if there is any other way, by just defining the path to load the data?
i appreciate your response. Thanks

sgiri · April 5, 2018, 7:04pm

Try this:

from hdfs3 import HDFileSystem
hdfs = HDFileSystem();
hdfs.ls("/user/sandeepgiri9034")

import pandas as pd
with hdfs.open('/user/sandeepgiri9034/my_movie_ratings.csv') as f:
     df = pd.read_csv(f, nrows=1000)
df

Mehdi_Sidi_Boumedine · September 25, 2021, 5:50am

Hi, this doesn’t seem to be working on the lab:


ImportError: Can not find the shared library: libhdfs3.so
See installation instructions at http://hdfs3.readthedocs.io/en/latest/install.html

How can I fix it please?