[SOLVED] Why is 1 added to argmax function in PCA

rajtilakb · June 2, 2020, 5:17am

While calculating the number of dimensions which explains 95% of the variance using PCA, we use the following piece of code:

d = np.argmax(cumsum >= 0.95) + 1

A lot of learners are curious as to why we add the 1 in this code. To understand this we must first understand the function of the argmax() method. The argmax method returns the indices of the maximum values along an axis. Consider the below code:

import numpy as np
a = np.array([1,2,3,4,3,2,1])
np.argmax(a > 2)

Output: 2

Here, the argmax() function returns the index number of the first occurrence of the number 3 in the array since it is the first number greater than 2. Here we must remember that since array indices in Python starts from 0, the index number for the first occurrence of 3 is 2. If we consider each number in this array to be a separate dimension here, 3 is then the 3rd dimension. So, to get the dimension from the index number, we are adding 1 to the same.

Now, going back to our original code, here were are trying to get the number of dimensions which explains 95% of the variance and not the index. So, when the argmax() function returns the index of the occurrence of the first element which satisfies the condition, to get the dimension we simply add 1 to it.