Empty string to a value in rdd , spark scala


#1

What’s the command to replace a field value with 0 when its empty string in a rdd and the field deli meter is “,”?

Suppose the data looks like

3070811,1963,1096,“US”,“CA”,1,
3022811,1963,1096,“US”,“CA”,1,56
3033811,1963,1096,“US”,“CA”,1,23

I want to see the output as

Array(Array(3070811,1963,109G, 0, “US”, “CA”, 0,1, 0) ,Array(3022811,1963,1096, 0, “US”, “CA”, 0,1, 56) ,Array(3033811,1963,1096, 0, “US”, “CA”, 0,1, 23) )


#2

@abhinav could you please help me in this command.


#3

sorry but your question is not very clear… The approach which you could use is to read a file and convert to array and then use map operation on top. but here I do not see any ‘empty string’ in your lines.

**sample program

import scala.io.Source
import java.io._

val writer = new PrintWriter(new File(“New.txt” ))
writer.write(“Hello Scala”)
writer.write("\n")
writer.write(“Bye Scala”)
writer.close()

val src = Source.fromFile(“New.txt”)
val lines = (for (line <- src.getLines()) yield line).toArray
lines

lines.map(x => (x,0))

Output would be like this… which can be modified as per your requirement. Hope you could build it further as you need. Do explore all the Source library and its functions.

*sample program o/p

writer = java.io.PrintWriter@296b4051
src = empty iterator
lines = Array(Hello Scala, Bye Scala)
:56: warning: a pure expression does nothing in statement position; you may be omitting necessary parentheses
lines
^
[(Hello Scala,0), (Bye Scala,0)]