Difficulty in understanding expected output of Project

Manoj_Kumar · April 22, 2020, 6:54am

Hi All,

While solving below project I’m not able to understand what is the expected output.

Python Project - Churn Emails - Dataset

We have a text file which records mail activity from various individuals in an open source project development team. Below is the file location/cxldata/datasets/project/mbox-short.txtTo see the first 15 lines of mbox-short.txt , please use below command on the console

These files are in a standard format for a file containing multiple mail messages. The lines which start with "From " separate the messages and the lines which start with “From:” are part of the messages. For more information about the mbox format, please see this wikipedia article

e-mail address is appearing twice for a mail in the given file eg.

From stephen.marquard@uct.ac.za Sat Jan 5 09:14:16 2008
From: stephen.marquard@uct.ac.za

What should be the output look like?

(1) stephen.marquard@uct.ac.za
stephen.marquard@uct.ac.za

(2) stephen.marquard@uct.ac.za (Only Unique e-mail id’s without From: )

(3) From: stephen.marquard@uct.ac.za (Only Unique e-mail id’s with From: )

(4) From stephen.marquard@uct.ac.za
From: stephen.marquard@uct.ac.za

I’m confused although I’ve completed this exercise where I’ve created a list object with the extracted e-mail ids. and then printed the list. You must mention your expected output in the question which is missing in this case.

Please help.

Regards
Manoj

satyajit_das · April 22, 2020, 4:30pm

Hi, Manoj.

Thanks for your suggestions!.
The sample output of all questions were already given.
I agree with you that duplicate emails are there but as per the questions and the expected output you need to write the logic of acceptance.

All the best!