How to create the Dataframe for MYsql table

I tried to create the data frame for mysql table using below commands but it throwing exception as below. could you please reply ASAP.

val sqlcontext = new org.apache.spark.sql.SQLContext(sc)
val prop = new java.util.Properties
val table1=“customers”

var conn_string = “jdbc:mysql://ip-172-31-13-154:2181;useLOBs=false;database=retail_db;user=sqoopuser;password=NHkkP876rp”
var sdf = sqlContext.read.jdbc(conn_string, table1, prop).

Exception: java.lang.IllegalArgumentException: Error while instantiating ‘org.apache.spark.sql.hive.HiveSessionStateBuilder’:
at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$instantiateSessionState(SparkSession.scala:1062)
at org.apache.spark.sql.SparkSession$$anonfun$sessionState$2.apply(SparkSession.scala:137)
at org.apache.spark.sql.SparkSession$$anonfun$sessionState$2.apply(SparkSession.scala:136)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:136)
at org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:133)
at org.apache.spark.sql.DataFrameReader.(DataFrameReader.scala:689)
at org.apache.spark.sql.SparkSession.read(SparkSession.scala:645)
at org.apache.spark.sql.SQLContext.read(SQLContext.scala:504)
… 48 elided
Caused by: java.lang.ClassNotFoundException: java.lang.NoClassDefFoundError: org/apache/tez/dag/api/SessionNotRunning when creating Hive client using classpath: file:/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.91-0.b14.el7_2.x86_64/jre/
lib/resources.jar, file:/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.91-0.b14.el7_2.x86_64/jre/lib/rt.jar, file:/usr/lib/jvm/j

HI @venkat_Adimulam

Which version of spark you tried it.

It was working fine on the spark version [spark2.0.1]

1.First you need to run the jar [mysql-connector-java.jar] along with the spark-shell

spark-shell --driver-class-path /usr/share/java/mysql-connector-java.jar

2.pass the sqlContext

val sqlContext = new org.apache.spark.sql.SQLContext(sc)

3.Passing the jdbc connection values and creating the connection for the dataframe

val df_jdbc_mysql = sqlContext.read.format("jdbc").option("url", "jdbc:mysql://**mysql_url**/**databasename**").option("driver", "com.mysql.jdbc.Driver").option("dbtable", "tablename").option("user", "**username**").option("password", "password").load()

4.checking the column details

dataframe_mysql.printSchema

Hope it helps you :slight_smile:

1 Like

Thanx Shanumk. working now.

i added mysql jar to the spark class path
export SPARK_CLASSPATH=/usr/hdp/2.3.4.0-3485/hive/lib/mysql-connector-java.jar
use the spark 2 version /usr/spark2.2.1/bin/spark-shell.