Java Out of Memory Issue with Movie dataframe SQL


#1

I am running the below data frame operations/ data frame SQLs but not able to complete due to java out of memory exception error:
Error:… Name: org.apache.spark.SparkException
Message: Job aborted due to stage failure: Task 1 in stage 40.0 failed 1 times, most recent failure: Lost task 1.0 in stage 40.0 (TID 287, localhost, executor driver): java.lang.OutOfMemoryError: Java heap space…

Code:

//val joindf=moviesdf.join(ratingsdf, moviesdf.col(“MovieID”)===ratingsdf.col(“MovieID”)).filter($“Genre”.like("%War%"))
//joindf.groupBy(“Name”).agg(avg(“Rating”).alias(“AvgRating”)).filter($“AvgRating”>=4.8).show(10)

moviesdf.createOrReplaceTempView(“movies”)
ratingsdf.createOrReplaceTempView(“ratings”)
spark.sql(“select Name, avg(rating) avg_rating from movies m join ratings r on m.MovieID=r.MovieID where genre like ‘%War%Comedy%’ group by Name”).show(10)
//spark.sql(“select name,rating from movies m join rating r on m.movieid=r.ratingid limit 10”)


#2

not able to post the full code as its throwing error saying new user can put only 2 links :frowning:


#3

Hi @sudhanshu_guru,

Can you try in yarn mode? If your code is using more than 2GB of memory either in local or yarn mode it will be killed by our bots.