Parquet to csv conversion

Hi - Trying to convert parquet to cvs file in Hadoop and load into Teradata thru TPT (one time activity).Cannot use TDCH.Is there a way to quickly convert parquet to csv,currently using hive -e option but taking time .
Have limitations as running in prod not able to use spark /sqoop export options.

Hi,

  1. You can create a Teradata environment and Create TPT job using wizard given,
    Kindly refer below.
  1. By using pandas (This is One liner ).

import pandas as pd

pd.read_parquet(‘Your_file.parquet’).to_csv(‘Your_file.csv’)

refer here : https://www.kaggle.com/jorijnsmit/convert-parquet-to-csv

refer this for paraquet compressions techniques and :https://parquet.apache.org/documentation/latest/

All the best!

Thanks satyajit!!
Yes I’m using TPT for exporting to TD.
But as the tables are stored as parquet TPT cannot be used directly,I’m getting data into a cvs and using tbuild to export.
Tried pandas but doesnot have that package in my prod server and not able to install it,too restricted env with less options.
Looking for any native faster parquet to csv conversion techniques,and use tbuild for export.

  1. You can use Athena to convert parquet files in S3 to CSV and store back in S3 if you are using AWS.

  2. You can use below techniques.

     a) **Pyarrow**  
    

import pyarrow.csv as pv
import pyarrow.parquet as pq
table = pv.read_csv(‘your_file.csv’)
pq.write_table(table, ‘your_file.parquet’)
https://arrow.apache.org/docs/python/parquet.html

     b) **DASK** 

import dask.dataframe as dd
df = dd.read_csv(‘your_file.csv’)
df.to_parquet(‘your_file.parquet’)

     C) **Koalas API** 

import databricks.koalas as ks
df = ks.read_csv(‘your_file.csv’)
df.to_parquet(‘your_file.parquet’)

  1. You have use this Python package parquet 1.3.1 to convert Parquet --> TSV -->CSV.
  1. You can use the online converter directly. (Click on Download as CSV option).

http://parquet-viewer-online.com/

All the best!