Programming error


#1

IN THE PROGRAM OF PROGAMATICALLY SPECIFYING THE SCHEMA THE PROGRAM IS INCOMPLETE AS THERE IS NO AGE FIELD IS DEFINED PLEASE PROVIDE ME THE WHOLE CODE ASAP,

THANKYOU


#2

Hi, Anubhav.

Kindly share the topics and questions in scala programming you are referring to? or share the code.
I will look into it.

All the best!


#3

here it is…
there is no schema defined for age

import org.apache.spark.sql.types._
import org.apache.spark.sql._
// The schema is encoded in a string
val schemaString = “name age”
val fieldsArray = schemaString.split(" “)
val fields = fieldsArray.map(
name => StructField(name, StringType, nullable = true)
)
val schema = StructType(fields)
val peopleRDD = spark.sparkContext.textFile(”/data/spark/people.txt")
val rowRDD = peopleRDD.map(_.split(",")).map(
attributes => Row(attributes(0), attributes(1).trim)
)
val peopleDF = spark.createDataFrame(rowRDD, schema)


#4

It worked for me fine. See the attached screenshot. Please note that the double quotes are crazy. You will have delete the double quote and type those again.

Also, we have for now considered both columns name and age as String types for this example. In real case, there could be another source of the datatypes of columns just like we have a schemaString for defining the column names.


#6

don’t we have to define schema for age also like name??


#7

In real life, scenerios, you do. Here we just considered it to be string.

The following code is creating an array of StructField object having String as column type corresponding to each of elements of fieldsArray.

val fields = fieldsArray.map(
name => StructField(name, StringType, nullable = true)
)

Notice that it will create StringType StructField for both “name” and “age”.