Maximum value of a column in apache pig
Clash Royale CLAN TAG#URR8PPP
Maximum value of a column in apache pig
I am trying to find the maximum value of a column ratingTime using pig.I am running below script :
ratings = LOAD '/user/maria_dev/ml-100k/u.data' AS (userid:int,movieID:int,rating:int, ratingTime:int);
maxrating = MAX(ratings.ratingTime);
DUMP maxrating
Sample Input data is :
196 242 3 881250949
186 302 3 891717742
22 377 1 878887116
244 51 2 880606923
I am getting below error :
2018-08-05 07:02:05,247 [main] INFO org.apache.pig.backend.hadoop.PigATSClient - Created ATS Hook
2018-08-05 07:02:05,914 [main] ERROR org.apache.pig.PigServer - exception during parsing: Error during parsing. <file script.pi
I tried with semicolon as well, its still not working
– bp89
Aug 6 at 6:43
Again, how are you running this?
pig -f
or typing each line in the grunt shell? Can you please edit your post to include a sample of the input file?– cricket_007
Aug 6 at 7:17
pig -f
I am running this pig script using ambari web console, also edited question to provide sample input.
– bp89
Aug 6 at 7:53
I think you need to load the file
USING PigStorage
– cricket_007
Aug 6 at 12:06
USING PigStorage
1 Answer
1
You need a preceding GROUP ALL
before applying MAX
.Source
GROUP ALL
MAX
ratings = LOAD '/user/maria_dev/ml-100k/u.data' USING PigStorage('t') AS (userid:int,movieID:int,rating:int, ratingTime:int);
rating_group = GROUP ratings ALL;
maxrating = FOREACH ratings_group GENERATE MAX(ratings.ratingTime);
DUMP maxrating;
Thanks it worked for me!
– bp89
Aug 6 at 15:10
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
How are you running this? You're missing a semicolon on DUMP command
– cricket_007
Aug 5 at 13:28