Word Embedding for Computer Vision

Embedding is method of converting the word into numeric value with meaning associated with it. Is embedding in computer vison means converting photos with numeric value ? if yes

  • How different is pixel value from embedding, since pixel value is also numeric?

If you take the two photos of one person - p11, p12 and one from a different person p21 and calculate the corresponding embedding say: ep11, ep12 and ep21.

Then, the embedding of the same person will be nearby. This is how you visualize it in 2d space (though it is going to be around 128-dimensional space).

Pixels comparison would not yield anything while by comparing embeddings you can identify people.

Thanks for your answer. I understand now