Do we need CategoricalEncoder for the 'Bike Rental Forecasting' problem?


#1

I need a little help to understand this question.

I have clearly understood one hot-encoding (making one boolean column for each category) but not totally clear about CategoricalEncoder. Is it giving the column names as integers? An explanation (or reference material) about that will really appreciated.

More important, in this problem why is encoding not needed? We do have few categorical data like season…


#2

Kamal,

Remember when we did end-to-end project. On that data set, the last column (ocean_proximity) has data in char (non numerical values) and also that column is dependent. If you look the data there are five  categories. 

<1H OCEAN
INLAND
NEAR OCEAN
NEAR BAY
ISLAND

For easy computational / compare purpose while processing the data we gave them a numerical value for each category
This is accomplished by encoding (used onehotencoder).

In the bike rental we don’t have a dependent column with character values.

Season is a Categorical column but it is already identified by numerical values. Suppose if the data is like, Spring, Summer, Fall and Winter… we have to encode with numerical value.

This is my understanding


#3

Hey Giji, thanks for explaining however that is my very question that Spring is a regular categorical and not an Ordinal categorical data. That is we cannot say one season is bigger than the other or related in any order…

So I think, we should do one-hot-encoding. Is that not correct?


#4

In this bike rental example , Season is an Ordinal categorical data

season 17379 non-null int64


#5

Does integer imply it is necessarily ordinal categorical? Is it not necessary to have a relation like those numbers?


#6

Many machine learning algorithms cannot operate on label data directly. They require all input variables and output variables to be numeric.

In general, this is mostly a constraint of the efficient implementation of machine learning algorithms rather than hard limitations on the algorithms themselves.

This means that categorical data must be converted to a numerical form. If the categorical variable is an output variable, you may also want to convert predictions by the model back into a categorical form in order to present them or use them in some application.


#7

@sandeepgiri, @abhinav I am still not clear.

  1. Is season an ordinal categorical as @Giji_Kurian said? Why? Is it not necessary for Ordinal Categorical to have a sequence among them… (1:springer, 2:summer, 3:fall, 4:winter) is it good enough to say that they occur sequentially so they are Ordinal?

  2. weathersit is definitely not ordinal right?

  3. Just confirming that the answer to this question (no. 24) is no because CategoricalEncoder does both text to number and then one hot-encoding, while our data is already numbers. But we still need to do one hot-encoding of the season data right? Or at least of weathersit?

Then why is the answer to question no. 26 (Do we need HotEncoder for this problem?) as No?

:frowning:


#8

The way I generally do it is by the way of saying if the value of season is 1.5, does it make sense? In this case, it doesn’t make sense. So, we treat it like a categorical variable.