generating an id/counter for foreach in pig latin
I want some sort of unique identifier/line_number/counter to be generated/appended in my foreach construct while iterate开发者_StackOverflows through the records. Is there a way to accomplish this without writing a UDF?
B = foreach A generate a_unique_id, field1,...etc
How do I get that 'a_unique_id' implemented?
Thanks!
If you are using pig 0.11 or later then the RANK
operator is exactly what you are looking for. E.G.
DUMP A;
(foo,19)
(foo,19)
(foo,7)
(bar,90)
(etc.,0)
B = RANK A ;
DUMP B ;
(1,foo,19)
(2,foo,19)
(3,foo,7)
(4,bar,90)
(5,etc.,0)
There is no built-in UUID function in the main Pig distribution or piggybank. Unfortunately, I think your only option is going to be writing a UDF.
There is a standard way of building UUIDs and there is Java code out there you can utilize to build off of for your UDF.
Is there a particular reason why you don't want to write a UDF?
精彩评论