Hi Team - I have 2 simple data files. I want to load these 2 data files and then run a sample Pig Latin script that I have which access the data files. I need some help. below is the Pig Latin script.
Pig Latin statements. this is from one of the online class that I am taking.
When I use Cloud x Lab to upload the 2 data files that I have, what would be the directory path so that I can update the “jjzhang/lab3/people.csv”, “piglab/zip-city.csv” and “'piglab/output/Joined_Results” that you see below?
My goal is to be able to run the below script and understand the functionality.
A = LOAD ‘jjzhang/lab3/people.csv’ USING PigStorage (’,’) AS (gender:chararray, age:int, income:int, zip:int);
B = FOREACH A GENERATE income, zip;
DUMP B;
C = FOREACH B GENERATE income/1000, zip;
DUMP C;
D = FILTER B BY income > 20000;
DUMP D;
Sorted_Income = ORDER D BY income ASC;
DUMP Sorted_Income;
ZipCity = LOAD ‘piglab/zip-city.csv’ USING PigStorage (’,’) AS (zip:int, city:chararray);
joined_results = JOIN A BY (zip), ZipCity BY (zip);
DUMP joined_results;
STORE Sorted_Income INTO ‘piglab/output/Joined_Results’ USING PigStorage(’,’);