Question from titanic project

class MostFrequentImputer (BaseEstimator, TransformerMixin):
    def fit(self, X, y=None):
        self.most_frequent_ = pd.Series([X[c].value_counts().index[0] for c in X],
                                        index=X.columns)
        return self
    def transform(self, X, y=None):
        return X.fillna(self.most_frequent_)

Create an imputer for string categorical columns

This is the 10th exercise in the titanic project. Can somebody explain what this code is doing and hos its working?

Hi, Suvain.

  1. fit() function

a) As per your code X may be a list that you are passing.

b) Iterating through the list value_counts().index[0] will Returns the object containing the counts of unique values or in this case may be “string”.

c) The result in form of Series" object will be stored in most_frequent_ then

  1. transform() function
    fillna()
    This method will fill the NA/NaN values in the passed argument X as “string” or the most occuring value you got from the variable of “most_frequent_” in the fit() method.

Note :- This MostFrequentImputer(fit()+transform() will replace the null. NA/NaN values with the most frequently occurring string in the columns passed.

All the best!