Unable to set ORC stripe size, small file issue

I am writing a DF into a path and also table, but 5 files are getting created instead of one, i tried setting stripe size to 64MB, still 5 files of 900KB are getting generated, can someone help me in this regards."

df write part–

spark shell–

data files in HDFS path :-1:
[drop2kanhu8197@cxln4 conf]$ hdfs dfs -du -s -h /tmp/tableORC/*
0 /tmp/tableORC/_SUCCESS
970.5 K /tmp/tableORC/part-00000-5b69bd4b-3cea-4f18-9a4c-e1bd3ada1df8.snappy.orc
963.8 K /tmp/tableORC/part-00001-5b69bd4b-3cea-4f18-9a4c-e1bd3ada1df8.snappy.orc
961.6 K /tmp/tableORC/part-00002-5b69bd4b-3cea-4f18-9a4c-e1bd3ada1df8.snappy.orc
962.4 K /tmp/tableORC/part-00003-5b69bd4b-3cea-4f18-9a4c-e1bd3ada1df8.snappy.orc
24.9 K /tmp/tableORC/part-00004-5b69bd4b-3cea-4f18-9a4c-e1bd3ada1df8.snappy.orc

I am expecting a file as below :::
[drop2kanhu8197@cxln4 conf]$ hdfs dfs -du -s -h /tmp/tableORC3/*
0 /tmp/tableORC3/_SUCCESS
3.6 M /tmp/tableORC3/part-00000-536dd545-97f7-483e-a468-f73331d52856.snappy.orc

1 Like