Word Counts in Twitter sentiment analysis

Gandharv_Bakshi · June 24, 2020, 1:36am

Hi,
I am trying out Twitter sentiment analysis to practice my skills. I have reached the “Training models” part of the course. Anyways, I have a fundamental doubt on how one can use word counts from training set on future test sets (which may form a different word cloud)? What I did was the following:

Transformation was a two-step process: 1. Clean the text (remove common words, use stemming) 2. Use Count Vectorizer (converts word counts into columns)
Fit with sentiment labels.

However, when I try the above model on a new test set, the count vectorizer would result in a different sized vocabulary or a different vocabulary all together. How does one control for it?

abhinav · June 25, 2020, 6:12pm

Hi @Gandharv_Bakshi,

Can you please give more context on how are you fitting the sentiment labels? Are you using any library for sentiment analysis?

So generally for wordcloud we do not have to do any splitting in training and test set …we can pass the tweets to word cloud generation library and it will take care of it