When to create the pipeline?

Q. Should we first split the data AND then perform pipeline or first pipeline and then perform split

Question by SOUMYADEEP_BANIK_CHO

Hi,

The purpose of creating a pipeline is that the same pipeline could be use to transform both the dataset training and testing.

Since we don’t want to peek into the test dataset, we always split the dataset upfront in the begining then do various transformations of training dataset and whatever transformations we do on training dataset we chain those together and form a pipeline.

We run the sample pipeline on test dataset before calling the model on the test dataset.

1 Like

Is there any transformation req. in wine quality analysis. I found that all the data is in numerical so we don’t need Data frame selector and also no data values are missing.