Convert Oracle stored procedure to Hive or Spark

nishant_ambastha · October 13, 2017, 6:13am

Hello,

I have written stored procedures in Oracle. Is there any way I can implement the same functionalities in Hive or Spark.

I need to know if that’s possible or if there is some other way around. I searched online for it, it seems Hive version 2 provides option to work with stored procedures. But what could be the solution for the older versions.

Regards
Nishant

sgiri · October 13, 2017, 10:48am

Good Question Nishant.

Stored Procedures are nothing bunch of SQL queries stored on the database side. The stored procedures provide a way to pass an argument and substitute a variable. So, you can call stored procedures from remote with arguments.

Here are some of the solutions:

1. Use HPL / SQL

HPL/SQL provides the functionality of stored procedures. It is now available as part of HDP (HortonWorks Data Platform) 2.6.
See: http://www.hplsql.org/start

We are working on upgrading CloudxLab to next version. The HPL/SQL will be available in that.

So far I have tried the following but has not worked because of some dependencies:

    wget http://www.hplsql.org/downloads/hplsql-0.3.31.tar.gz
    gunzip hplsql-0.3.31.tar.gz 
    tar -xvf hplsql-0.3.31.tar 
    cd hplsql-0.3.31
    export HADOOP_CLASSPATH=`hadoop classpath`
    ./hplsql --version
    ./hplsql -e "select * from employees"

2. Using Views

Mostly, we want to create stored procedures in analytical environments in order to make the queries simplified and be able to use the same piece of logic in many other places. Same can be achieved to some extent using Views.

Also, views are stored in the database.

3. Using UDF - User Defined Functions

Unlike the name suggests, UDF is not really like an stored procedure but it does let you define your custom logic in Java.

4. Save it in a file and load it.

Alternatively, if your objective is to keep the SQL separate from the actual code. Keep SQL in a separate text file and load it using Java and send it to Hive after substituting the params.

nishant_ambastha · October 13, 2017, 6:12pm

Hi Sandeep,

Thank you for the reply.

Yes, it can be achieved using Views and UDFs. But for database operations involving cursors, what approach to follow?

Any example for UDF in Scala for Hive implementing cursors would be great for better understanding.

Regards
Nishant

mannharleen · October 18, 2017, 9:53am

Hi Nishant,

I am performing a task at my current project where we are converting oracle pl sql into spark jobs. The difference btwn ur case and mine is that I am actually replicating the DB into hdfs and then running the stored procedure logic using spark on the hdfs datasets.

This means a cursor in an oracle SP is equivalent to a map operation on a dataframe.row in spark.

Does this answer your question? Or do you need more details?

nishant_ambastha · November 2, 2017, 2:47pm

Hi Harleen,

Thanks for the response.

Can you please provide some more details on this?

Regards | Nishant

Gajendra_Seervi · January 17, 2018, 6:42am

Hi All,

I have data and packages/Procedures in Oracle.
I can copy the Oracle data into HDFS but I want to process the data using packages/Procedures created in Oracle.
Can any one let me know how it is possible.

soumyakanta_rath · July 31, 2018, 10:35am

Hi @mannharleen thanks for your view ; I have couple of questions or more insights needed so basically you have migrated all your traditional Oracle tables into Hive tables and using those hive tables you have created Spark dataframe and perform analysis/procedures( Oracle) into it?
Thanks for your views any more insights will be helpful

Thanks
Rath

mannharleen · August 5, 2018, 11:42pm

@soumyakanta_rath that is correct.

SPs are simply SQLs with variables, conditions and iterations. these can be easily mimicked using spark API. think the follow:
variables in SPs => variables in spark
if conditions => if conditions in spark
iterations => map/flatmap/foreach in spark

Pothi_Reddy_Muni · June 26, 2019, 6:20pm

@mannharleen
How to convert “insert into table select sysdate,time from dual” this query to spark-sql?
how to create tables" create table ‘existing table name’ ‘_acc’ in spark?

mannharleen · June 27, 2019, 6:03am

You can only create tempTables in spark. These are not persisted over spark sessions.
That’s is where’s hive metastore comes into place. The table with a persistence storage must be created on a metastore service like hive or glue catalog.

PS. My answer is based on spark 2.initial releases. Haven’t worked on the newer spark apis off late

sachin · June 27, 2019, 10:03am

Hi Harry,
Have you completed this project.
Please share some examples how did you incorporated plsql logic in spark
Please help me
Regards
Sachin

mannharleen · June 27, 2019, 11:00pm

Hi Sachin.

That was almost 2 years ago. I don’t have access to the project since it was in my prev .org. But I can offer help if you have specific questions.

Best.

Durgesh · September 1, 2019, 9:50pm

I have a recursive function in PLSQL ( Function calling itself) and I have to convert same logic in SparkSql ( I am using data bricks)
I have created DFs for each logic check and at the end doing a Left outer join with source query ( from where function was being called in plsql) .
I m facing two difficulties , 1 ) how to do a dynamic recursive call and 2) Open curser and its loop is also based on a dynamic logical condition.

Below is sample plSQL function I am trying to convert into a SparkSql Code.
Sample Example :

function ( p_emp id , p_activity id)
{
Req_Var = 0
Comp_var = 0

If row exists ( some query where empid = p_emp_id )
then req_var = req_var -1
else 0

if row exists ( some other query where activityid = p_activity _id)
then Comp_var = Comp_var + 1

If row exists ( some query )

then open cursor loop

If row exists ( some query)
then
if Funciton ( empid, parent_activity_id) = 1
then comp_var = comp_var + 1
else
req_var = req_var -1

end cursor

if comp_var > req_var
then return 1
else 0
}