Problem
Write a Spark code to find users having same DNA in the file stored in HDFS.
Dataset
The file is located at
/data/mr/dna/dna.txt
Sample Output
Output file will have the users having same DNA
ACG ['User5', 'User3']
ACGT ['User4', 'User1']
Problem
Write a Spark code to find users having same DNA in the file stored in HDFS.
Dataset
The file is located at
/data/mr/dna/dna.txt
Sample Output
Output file will have the users having same DNA
ACG ['User5', 'User3']
ACGT ['User4', 'User1']
Please correct me if it needs some improvement 
var Dnardd = sc.textFile("/data/mr/dna/dna.txt")
def clean(line:String) = {
var arr = line.split(" ")
(arr(3).trim,arr(0))
}
var pairs = Dnardd.map(clean)
var userdna = pairs.groupByKey()
#output
Array((TGCA,CompactBuffer(User2)), (ACG,CompactBuffer(User3, User5)), (ACGT,CompactBuffer(User1, User4)), (AGCT,CompactBuffer(User6)))