Question from titanic project

SUVAIN_G · July 23, 2020, 8:58am

class MostFrequentImputer (BaseEstimator, TransformerMixin):
    def fit(self, X, y=None):
        self.most_frequent_ = pd.Series([X[c].value_counts().index[0] for c in X],
                                        index=X.columns)
        return self
    def transform(self, X, y=None):
        return X.fillna(self.most_frequent_)

Create an imputer for string categorical columns

This is the 10th exercise in the titanic project. Can somebody explain what this code is doing and hos its working?

satyajit_das · July 23, 2020, 11:35am

Hi, Suvain.

fit() function

a) As per your code X may be a list that you are passing.

b) Iterating through the list value_counts().index[0] will Returns the object containing the counts of unique values or in this case may be “string”.

c) The result in form of Series" object will be stored in most_frequent_ then

transform() function
fillna()
This method will fill the NA/NaN values in the passed argument X as “string” or the most occuring value you got from the variable of “most_frequent_” in the fit() method.

Note :- This MostFrequentImputer(fit()+transform() will replace the null. NA/NaN values with the most frequently occurring string in the columns passed.

All the best!