Hi All,
While solving below project I’m not able to understand what is the expected output.
Python Project - Churn Emails - Dataset
We have a text file which records mail activity from various individuals in an open source project development team. Below is the file location/cxldata/datasets/project/mbox-short.txt
To see the first 15 lines of mbox-short.txt
, please use below command on the console
These files are in a standard format for a file containing multiple mail messages. The lines which start with "From " separate the messages and the lines which start with “From:” are part of the messages. For more information about the mbox format, please see this wikipedia article
e-mail address is appearing twice for a mail in the given file eg.
From stephen.marquard@uct.ac.za Sat Jan 5 09:14:16 2008
From: stephen.marquard@uct.ac.za
What should be the output look like?
(1) stephen.marquard@uct.ac.za
stephen.marquard@uct.ac.za
(2) stephen.marquard@uct.ac.za (Only Unique e-mail id’s without From: )
(3) From: stephen.marquard@uct.ac.za (Only Unique e-mail id’s with From: )
(4) From stephen.marquard@uct.ac.za
From: stephen.marquard@uct.ac.za
I’m confused although I’ve completed this exercise where I’ve created a list object with the extracted e-mail ids. and then printed the list. You must mention your expected output in the question which is missing in this case.
Please help.
Regards
Manoj