Using set/list data types for intermediate keys in Hadoop
In an Apache Hadoop map-reduce program, what are the options for using sets/lists as keys in the output from the mapper?
My initial idea was to use ArrayWritable
as key type, but that 开发者_运维知识库is not allowed, as the class does not implement WritableComparable
. Do I need to define a custom class, or is there some other set like class in the Hadoop libraries that can act as key?
I thought ArrayWritable implemented Writable which is a superinterface of WritableComparable.
Did you subclass ArrayWritable? According to the documentation you need to subclass it so that you can set the type of object to be stored by the array. For example:
public class TextArrayWritable extends ArrayWritable {
public TextArrayWritable() {
super(Text.class);
}
}
Checkout the ArrayWritable javadocs.
精彩评论