Hello,
as per my understanding on Hive concepts, if we load the dataset into hive table, the data file will move from source path to hive warehouse within HDFS, and HDFS was set to three replicas for the data.
these questions might look silly but as i am beginner, i want clear my doubts.
my questions are:
-
if i delete the hive table, will it delete data file from hive warehouse only or along other two replicas from HDFS also?
-
if we are processing query on hive table, will that query be done as distributed processing?
per say, one data file is of size 1GB (interns 8 blocks x 128MB), and as we have three replication factor, there would be total 24 blocks available for this file
will our hive query be distributed among all the data blocks or it would be processed on hive warehouse blocks only?
3). if we create a new table from output of hive query, will the output table datafile be replicated three times in HDFS? or only one file kept in Hive warehouse?
Thanks in advanceā¦