Validate and Load data into HIVE with PIG

Hi,
I have a data something like this in a file
Student No~Year~amount~Name
Need perform following validations with PIG

  1. Validate amount number expected. if not put into different hive table
  2. unique identifier Student No should be number and unique if duplicates put into different hive table
  3. valid rows into one table and invalid rows into another table

Can we do this with PIG script? Pl suggest how can we do that…Thank you

Hi @madhueppa,

Yes, we can load data into Hive with Pig with hcatalog.

Did you try writing some code in Pig to solve above problem? If yes, please share the code here so that we can discuss how can you go forward and suggest improvements if any.

Hi Abhinav,
I was trying to load data from a hive table to a pig relation. so I gave the command ‘pig -useHcatalog’ before entering into the grunt cell.
but it didnot work. Could you please tell me how to connect to hive metastore from pig grunt shell?
Thanks in advance,
Aparna

Hi @aparna149,

Can you please let me know what error did you get when you tried to access Hive from Pig shell?

ls: cannot access /usr/hdp/2.6.2.0-205/hive/lib/slf4j-api-*.jar: No such file or directory
ls: cannot access /usr/hdp/2.6.2.0-205/hive-hcatalog/lib/hbase-storage-handler-.jar: No such file or directory
21/01/20 19:12:23 INFO pig.ExecTypeProvider: Trying ExecType : LOCAL
21/01/20 19:12:23 INFO pig.ExecTypeProvider: Trying ExecType : MAPREDUCE
21/01/20 19:12:23 INFO pig.ExecTypeProvider: Trying ExecType : TEZ_LOCAL
21/01/20 19:12:23 INFO pig.ExecTypeProvider: Trying ExecType : TEZ
21/01/20 19:12:23 INFO pig.ExecTypeProvider: Picked TEZ as the ExecType
2021-01-20 19:12:23,179 [main] INFO org.apache.pig.Main - Apache Pig version 0.16.0.2.6.2.0-205 (rUnversioned directory) compiled Aug 26 2017, 09:3
4:39
2021-01-20 19:12:23,179 [main] INFO org.apache.pig.Main - Logging error messages to: /home/prassadht2483/pig_1611169943178.log
2021-01-20 19:12:23,208 [main] INFO org.apache.pig.impl.util.Utils - Default bootup file /home/prassadht2483/.pigbootup not found
2021-01-20 19:12:23,787 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://cx
ln1.c.thelab-240901.internal:8020
2021-01-20 19:12:24,800 [main] INFO org.apache.pig.PigServer - Pig Script ID for the session: PIG-default-bad32274-6553-4ece-bf28-c2b38f64a0d6
2021-01-20 19:12:25,152 [main] INFO org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timeline service address: http://cxln2.c.thelab-240
901.internal:8188/ws/v1/timeline/
2021-01-20 19:12:25,263 [main] INFO org.apache.pig.backend.hadoop.PigATSClient - Created ATS Hook
grunt>

sir, i am getting this eerror

[prassadht2483@cxln5 ~]$ pig -useHCatalog
ls: cannot access /usr/hdp/2.6.2.0-205/hive/lib/slf4j-api-*.jar: No such file or directory
ls: cannot access /usr/hdp/2.6.2.0-205/hive-hcatalog/lib/hbase-storage-handler-.jar: No such file or directory
21/01/20 19:12:23 INFO pig.ExecTypeProvider: Trying ExecType : LOCAL
21/01/20 19:12:23 INFO pig.ExecTypeProvider: Trying ExecType : MAPREDUCE
21/01/20 19:12:23 INFO pig.ExecTypeProvider: Trying ExecType : TEZ_LOCAL
21/01/20 19:12:23 INFO pig.ExecTypeProvider: Trying ExecType : TEZ
21/01/20 19:12:23 INFO pig.ExecTypeProvider: Picked TEZ as the ExecType
2021-01-20 19:12:23,179 [main] INFO org.apache.pig.Main - Apache Pig version 0.16.0.2.6.2.0-205 (rUnversioned directory) compiled Aug 26 2017, 09:3
4:39
2021-01-20 19:12:23,179 [main] INFO org.apache.pig.Main - Logging error messages to: /home/prassadht2483/pig_1611169943178.log
2021-01-20 19:12:23,208 [main] INFO org.apache.pig.impl.util.Utils - Default bootup file /home/prassadht2483/.pigbootup not found
2021-01-20 19:12:23,787 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://cx
ln1.c.thelab-240901.internal:8020
2021-01-20 19:12:24,800 [main] INFO org.apache.pig.PigServer - Pig Script ID for the session: PIG-default-bad32274-6553-4ece-bf28-c2b38f64a0d6
2021-01-20 19:12:25,152 [main] INFO org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timeline service address: http://cxln2.c.thelab-240
901.internal:8188/ws/v1/timeline/
2021-01-20 19:12:25,263 [main] INFO org.apache.pig.backend.hadoop.PigATSClient - Created ATS Hook