Numpy - Arrays - Loading a text file data using NumPy's loadtxt() function - Step 2

NIRAV_RAJ · October 9, 2020, 4:53pm

i did not understand.

Hide Playground

14 / 32

Numpy - Arrays - Loading a text file data using NumPy’s loadtxt() function - Step 2

Now we will continue to load the dataset that we cloned in the previous step.

INSTRUCTIONS

Please follow the below steps:

(1) Import the required libraries

import numpy as np
import os

(2) Load using pandas

Now we will use pandas to load data from a large csv file (California housing dataset) and create a small csv file (of housing data) by extracting only few rows of data from this large housing.csv file.

We are creating a smaller csv file of data, just for our convenience, to make it easy for us to load it using loadtxt() function.

Don’t worry if you don’t know pandas yet, just copy and use the below pandas code as it is.

import pandas as pd
# defining housing.csv file path
HOUSING_PATH = '../ml/machine_learning/datasets/housing/'
# reading the large housing.csv file using pandas
housing_raw = pd.read_csv(os.path.join(HOUSING_PATH, "housing.csv"))
# extracting only a few rows (5 rows) of data from the pandas dataframe 'my_df'
my_df = housing_raw.iloc[ : 5]
# creating a new small csv file - 'housing_short.csv' - containing the above extracted 5 rows of data
my_df.to_csv('housing_short.csv', index=False)

Load using Numpy

Now, let us load the csv file - housing_short.csv - using NumPy’s loadtxt() function

please define a variable called FILE and assign to it the string value housing_short.csv .

FILE = '<<your code comes here>>'

(4) Create Function

Please define a function called load_housing_data() , as shown below, which takes filename (FILE) as input and loads this file using NumPy’s loadtxt() function. Just copy the below code as it is.

def load_housing_data(file = FILE ):
    return np.loadtxt(file, dtype={'names': ('longitude','latitude','housing_median_age','total_rooms','total_bedrooms','population','households','median_income','median_house_value','ocean_proximity'),'formats': ('f8', 'f8', 'f8', 'f8', 'f8', 'f8', 'f8', 'f8', 'f8', '|S15')}, delimiter=',', skiprows=1, unpack=True)

loadtxt() function parameters

first parameter - file. It is the name of the file from which the data is to be loaded.

second parameter - data type dtype of columns of the loaded csv file housing_short.csv . It is a Python dictionary with key as names of the columns, and values as the data types of these respective columns e.g. f8, |S15, etc.

‘f8’ means 64-bit floating-point number

‘|S15’ -means a string of length of 15 characters

third parameter - delimiter. It is the character by which values in a row of our csv file are separated. For example, in our case values of a row of our csv file - housing_short.csv - are separated by ‘,’ (comma)

fourth parameter - skiprows. You can specify here, how many initial rows of the csv file you want to skip loading. E.g. you may want to skip the first row of this csv file, as it may contain header information in the first row, which you may not want to load.

fifth parameter - unpack. When unpack is True, the returned array is transposed, so that arguments may be unpacked using x, y, z = loadtxt(...) . When used with a structured data-type, arrays are returned for each field. The default value for unpack is False. But here we are returning the individual arrays so we have kept it here asTrue.

(5) Call the Function

Please call the above defined load_housing_data() function, which returns various column values as NumPy arrays

longitude_arr,latitude_arr,housing_median_age_arr,total_rooms_arr,total_bedrooms_arr,population_arr,households_arr,median_income_arr,median_house_value_arr,ocean_proximity_arr = load_housing_data()

(6) Print

You can just check and confirm the values of one of the NumPy arrays (say median_house_value_arr ) that you got above by printing the same using print() function

print(<<your code comes here>>)

median_house_value_arr contains values of median_house_value column of the csv file - housing_short.csv

The whole program i did not understand.

please help me to undeerstand it.

please reply .

rajtilakb · October 10, 2020, 9:10am

In the step prior to this, you have cloned our GitHub repository to get the dataset. In this step, we are loding that dataset. Let me explain them to you using the numbers given beside each instructions:

No. 1: We are importing the libraries required to execute these steps
No. 2: We are loading the housing.csv dataset using Pandas into the housing_raw variable. Next we are selecting the first 5 rows in the my_df dataframe, and creating another csv file using that.
No. 4: Next we are defining the load_housing_data() function where we are setting up the data types for each columns in the dataset. The other parameters ar. e defined in the instructions.
No. 5: In this step we are calling the load_housing_data() function with our dataset.
No. 6: Finally, we are printing one of the columns from the dataset.

We missed No. 3 since it was numbered incorrectly in the post. Hope this helps.

NIRAV_RAJ · October 10, 2020, 9:41am

but Sandip sir did not teach pandas.

How can i understand?

i did not understand anything.

NIRAV_RAJ · October 10, 2020, 9:55am

please tell me No.3

because there i did not understand.

olease calrify.

Please tell me no.3 step

Loading using Numpy.

please reply.

rajtilakb · October 12, 2020, 8:15am

We do have Pandas video in our playlists. Please send us an email at reachus@cloudxlab.com. Also, you did not mention any point with the number 3.