I am running the below data frame operations/ data frame SQLs but not able to complete due to java out of memory exception error:
Error:… Name: org.apache.spark.SparkException
Message: Job aborted due to stage failure: Task 1 in stage 40.0 failed 1 times, most recent failure: Lost task 1.0 in stage 40.0 (TID 287, localhost, executor driver): java.lang.OutOfMemoryError: Java heap space…
Code:
//val joindf=moviesdf.join(ratingsdf, moviesdf.col(“MovieID”)===ratingsdf.col(“MovieID”)).filter($“Genre”.like("%War%"))
//joindf.groupBy(“Name”).agg(avg(“Rating”).alias(“AvgRating”)).filter($“AvgRating”>=4.8).show(10)
moviesdf.createOrReplaceTempView(“movies”)
ratingsdf.createOrReplaceTempView(“ratings”)
spark.sql(“select Name, avg(rating) avg_rating from movies m join ratings r on m.MovieID=r.MovieID where genre like ‘%War%Comedy%’ group by Name”).show(10)
//spark.sql(“select name,rating from movies m join rating r on m.movieid=r.ratingid limit 10”)