How to find unique ngrams from dataframe email column?


I have requirement to create a custom feature transformer in spark scala.I am using mleap to have this feature transformer for serialization.For instance i have a scala dataframe

±-------------------+ .
| email_list| .
±-------------------+ .
|| .
|| .
|| .
±-------------------+ .
If i use the transformer it converts the input array of strings into an array of below:

| email_list | ngrams| .
|| [t e, e s, s t, t…|
|| [m a, a v, v e, e…| .
|| [d n, n d, d…| .
±-------------------±-------------------+ .
How to get the distinct ngram present rather the pattern or array in the below code:

val emailD1F=emailDF.withColumn(“email_split”, split(col(“email_list”), “@”).getItem(0)).withColumn(“email_split”, split(col(“email_split”), “”)) .
val ngram = new NGram().setN(2).setInputCol(“col1”).setOutputCol(“ngrams”)

val ngramDataFrame = ngram.transform(emailD1F)

So end result should be

unique ngram present and total ngram present

Can anyone help me with this?This is required to make a custom transformer for ml model to run in mleap environment.