I have a cluster where we have 8 nodes. Each has 116 GB ram and 16 core and I am trying to read table X which is 250 GB in size. Table X I am joining with table Y 10 times to derive 10 columns. The size of table Y is 100 MB.
Now my question is when I am broadcasting table Y and explicitly caching the table y, script takes around 20 hours but when I am not caching and only broadcasting the table y the complete process took only 1 hour.
Unable to understand what is actually causing more time if we explicitly cache 100 MB after broadcasting the table.
Please help.