Hi,
I am struggling to figure out a way to solve below requirement in PySpark. Any help would be really appreciated.
*Requirement:
Read a date column value from Hive table and pass that dynamic value as date extension in file name , while writing into a csv file.
Ex:
Step1: Below is the sample sql from Hive. Imagine this will always return 1 value/cell.
results = spark.sql(Select ETL_FORM_DT From ETL_BATCH Where Btch_status=‘S’)
Step2: Then I want to assign above output to a variable, like
v_etl_frm_dt = results.select(“ETL_FORM_DT”)
** Here, v_etl_frm_dt is getting created as type “dataframe”.
Step3:
out_data.coalesce(1).write.mode(‘overwrite’).option(“quote”, “”).option(“emptyValue”, None).csv(“file:///home/pdc19883858/out_test_” + v_etl_frm_dt + “.csv”)
** If I use as above, I am getting an error “Can not concatenate 'str” and ‘dataframe’
So i guess if I can read above query output as a date/int type, rather as a dataframe, problem
solved… but still figuring out how!
How can I achieve this? Please let me know if my question is not clear.
Thanks in advance.