End-to-End Machine Learning Project Part-2A

NIRAV_RAJ · October 29, 2020, 5:22pm

I did not understand the whole program.

End-to-End Machine Learning Project Part-2

To make this notebook’s output identical at every run

np.random.seed(42)

np.random.permutation(5)

def split_train_set(data,test_ratio):

shuffled_indices=np.random.permutation(len(data))

test_set_size=int(len(data)*test_ratio)

test_indices=shuffled_indices[:test_set_size]
train_indices=shuffled_indices[test_set_size]

return data.iloc[train_indices],data.iloc[test_indices]

train_set,test_set = split_train_test(housing,0.2)

print(len(train_set),"train+",len(test_set),"test")

Please tell me the steps.

Please reply.

sgiri · October 30, 2020, 12:26pm

Basically, here we have tried to create our own method to split the data into training and test set.

Please try to experiment with each of the methods separately.

sgiri · October 30, 2020, 12:34pm

np.random.permutation(num) generates all the number from 0 upto num-1 in ramdom order.

Picks first few records to test_indices and rest into train_indices.

Picks the records from data at the train_indices and at test_indices and returns a tuple containing the two.

NIRAV_RAJ · October 31, 2020, 1:06pm

i did not understand.

“returns a tuple containing the two.”

data.iloc[train_indices],data.iloc[test_indices]

How can i return a tuple?

Please tell me the steps.

Please reply.