Python Project - Churn Emails - Count Number of Messages From Each Domain

this is my code

def count_message_from_domain():
msgdict={}
file=open(’/cxldata/datasets/project/mbox-short.txt’)
for line in file:
line=line.strip()
if line.startswith(‘From:’):
line=line.split("@")
domain=line[1]
if domain not in msgdict:
msgdict[domain]=1
else:
msgdict[domain]+=1

return(msgdict)    
this is the output:
{'uct.ac.za': 6,
 'media.berkeley.edu': 4,
 'umich.edu': 7,
 'iupui.edu': 8,
 'caret.cam.ac.uk': 1,
 'gmail.com': 1}
But the expected output in the question is :
{'uct.ac.za': 12,
 'media.berkeley.edu': 8,
 'umich.edu': 14,
 'iupui.edu': 16,
 'caret.cam.ac.uk': 2,
 'gmail.com': 2}
please tell me how it is possible because my logic is correct but the output is different? please solve this problem..

Hi Anubhav,

If you see that you are taking into account only “From:” and not "From ". You need to modify your code to take into account both the cases.

Regards,
Raj.

1 Like

i have taken both the cases now

if line.startswith('From:‘or’From’):

but still it is showing the same output as earlier

Hi Anubhav,

I think the issue with the code is that it is not splitting the line in a way where it would consider only the first part of each line. Would suggest you to first split the line, store it in a separate list, and then use the startswith condition. It is omitting some of the lines because of this reason and you are getting an incorrect count.

Thanks,
Raj

1 Like

I have noticed one thing that the my count value is half of the expected count value.??
but didn’t able to understand what is the problem…

Hi Anubhav,

Since you are not splitting the line initially but only using strip(), it is considering everything beyond the @ sign when you are splitting it in the next line. But you need to consider only the email address to collect the domains, so you need to split the lines first instead of stripping them, store them in a separate list, and then work on that list.

If you are still unable to move forward, you can always take a hint.

Regards,
Raj

1 Like

i got the correct answer thankyou