i did not understand.
14 / 32
Numpy - Arrays - Loading a text file data using NumPy’s loadtxt() function - Step 2
Now we will continue to load the dataset that we cloned in the previous step.
INSTRUCTIONS
Please follow the below steps:
(1) Import the required libraries
import numpy as np
import os
(2) Load using pandas
Now we will use pandas to load data from a large csv file (California housing dataset) and create a small csv file (of housing data) by extracting only few rows of data from this large housing.csv file.
We are creating a smaller csv file of data, just for our convenience, to make it easy for us to load it using loadtxt() function.
Don’t worry if you don’t know pandas yet, just copy and use the below pandas code as it is.
import pandas as pd
# defining housing.csv file path
HOUSING_PATH = '../ml/machine_learning/datasets/housing/'
# reading the large housing.csv file using pandas
housing_raw = pd.read_csv(os.path.join(HOUSING_PATH, "housing.csv"))
# extracting only a few rows (5 rows) of data from the pandas dataframe 'my_df'
my_df = housing_raw.iloc[ : 5]
# creating a new small csv file - 'housing_short.csv' - containing the above extracted 5 rows of data
my_df.to_csv('housing_short.csv', index=False)
Load using Numpy
Now, let us load the csv file - housing_short.csv
- using NumPy’s loadtxt()
function
please define a variable called FILE and assign to it the string value housing_short.csv
.
FILE = '<<your code comes here>>'
(4) Create Function
Please define a function called load_housing_data()
, as shown below, which takes filename (FILE) as input and loads this file using NumPy’s loadtxt()
function. Just copy the below code as it is.
def load_housing_data(file = FILE ):
return np.loadtxt(file, dtype={'names': ('longitude','latitude','housing_median_age','total_rooms','total_bedrooms','population','households','median_income','median_house_value','ocean_proximity'),'formats': ('f8', 'f8', 'f8', 'f8', 'f8', 'f8', 'f8', 'f8', 'f8', '|S15')}, delimiter=',', skiprows=1, unpack=True)
loadtxt() function parameters
first parameter - file. It is the name of the file from which the data is to be loaded.
second parameter - data type dtype
of columns of the loaded csv file housing_short.csv
. It is a Python dictionary with key as names
of the columns, and values
as the data types of these respective columns e.g. f8, |S15, etc.
‘f8’ means 64-bit floating-point number
‘|S15’ -means a string of length of 15 characters
third parameter - delimiter. It is the character by which values in a row of our csv file are separated. For example, in our case values of a row of our csv file - housing_short.csv
- are separated by ‘,’ (comma)
fourth parameter - skiprows. You can specify here, how many initial rows of the csv file you want to skip loading. E.g. you may want to skip the first row of this csv file, as it may contain header information in the first row, which you may not want to load.
fifth parameter - unpack. When unpack is True, the returned array is transposed, so that arguments may be unpacked using x, y, z = loadtxt(...)
. When used with a structured data-type, arrays are returned for each field. The default value for unpack is False. But here we are returning the individual arrays so we have kept it here asTrue.
(5) Call the Function
Please call the above defined load_housing_data()
function, which returns various column values as NumPy arrays
longitude_arr,latitude_arr,housing_median_age_arr,total_rooms_arr,total_bedrooms_arr,population_arr,households_arr,median_income_arr,median_house_value_arr,ocean_proximity_arr = load_housing_data()
(6) Print
You can just check and confirm the values of one of the NumPy arrays (say median_house_value_arr
) that you got above by printing the same using print()
function
print(<<your code comes here>>)
median_house_value_arr
contains values of median_house_value column of the csv file - housing_short.csv
The whole program i did not understand.
please help me to undeerstand it.
please reply .