Stratisfied Sampling using sciket-learn's stratisfied shuffle split

NIRAV_RAJ · November 11, 2020, 4:52pm

I did not understand.

from sklearn.model_selection import StratifiedShuffleSplit
split=StratifiedShuffleSplit(n_splits=1,test_size=0.2,random_state=42)
for train_index, test_index in split.split(USA_Housing,USA_Housing[“income_cat”]):

strat_train_set=USA_Housing.loc[train_index]

strat_test_set = USA_Housing.loc[test_index]

ValueError Traceback (most recent call last)
in
3 from sklearn.model_selection import StratifiedShuffleSplit
4 split=StratifiedShuffleSplit(n_splits=1,test_size=0.2,random_state=42)
----> 5 for train_index, test_index in split.split(USA_Housing,USA_Housing[“income_cat”]):
6
7 strat_train_set=USA_Housing.loc[train_index]

~\anaconda3\lib\site-packages\sklearn\model_selection_split.py in split(self, X, y, groups)
1744 to an integer.
1745 “”"
-> 1746 y = check_array(y, ensure_2d=False, dtype=None)
1747 return super().split(X, y, groups)
1748

~\anaconda3\lib\site-packages\sklearn\utils\validation.py in inner_f(*args, **kwargs)
71 FutureWarning)
72 kwargs.update({k: arg for k, arg in zip(sig.parameters, args)})
—> 73 return f(**kwargs)
74 return inner_f
75

~\anaconda3\lib\site-packages\sklearn\utils\validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator)
643
644 if force_all_finite:
–> 645 _assert_all_finite(array,
646 allow_nan=force_all_finite == ‘allow-nan’)
647

~\anaconda3\lib\site-packages\sklearn\utils\validation.py in _assert_all_finite(X, allow_nan, msg_dtype)
95 not allow_nan and not np.isfinite(X).all()):
96 type_err = ‘infinity’ if allow_nan else ‘NaN, infinity’
—> 97 raise ValueError(
98 msg_err.format
99 (type_err,

ValueError: Input contains NaN, infinity or a value too large for dtype(‘float64’).

How can i fix it?

Please reply.

rajtilakb · November 12, 2020, 5:53am

Hi Nirav, I tried calling you yesterday (November 11th) a number of times to understand the issues that you were facing but was unable to reach you. I had sent you an email asking for a time when I can speak to you, however, I never got any response from your end.

Please note that this is not a project from CloudxLab that you are trying to attempt. So without looking at the dataset, or your code, it is impossible for us to provide a solution to the error that you are getting. I would suggest you to refer to my comment to your previous post for a probable solution, here is the link to the same:

You can also refer to the End-to-End Project lecture videos and Jupyter notebooks for more information on this as you have already enrolled for the Machine Learning Specialization course. That should help you understand these issues, and how to solve them.

NIRAV_RAJ · November 12, 2020, 4:27pm

I did it.

Thank you.