Hi Team,

What’s the difference between SQOOP ingesting tool and SPARK jdbc connection ?
It looks like we can connect RDBMS db thru spark JDBC connection. What’s the difference between from spark and sqoop connection. Will there any performance difference ? Can you pls help me on this.

Sqoop is basically a map-reduce program that copies the data from the RDBMS into HDFS, HBASE and/or Hive.

Spark is a complete framework like Hadoop Map-Reduce to do any kind of processing. Using Spark, you can write methods to copy data from RDBMS into HDFS, HBase, Hive.

Thanks Sandeep!

Is there any performance difference? If I want to copy data from relational database and load into parquet format. Shall I use spark or sqoop?


For such specific task I think using sqoop will suffice.