Implementation of an ArrayWritable for a custom Hadoop type
How do I define an ArrayWritable for a custom Hadoop type ? I am trying to implement an inverted index in Hadoop, with custom Hadoop types to store the data
I have an Individual Posting class which stores the term frequency, document id and list of byte offsets for the term in the document.
I have a Posting class which has a document frequency (number of documents the term appears in) and list of Individual Postings
I have d开发者_运维百科efined a LongArrayWritable extending the ArrayWritable class for the list of byte offsets in IndividualPostings
When i defined a custom ArrayWritable for IndividualPosting I encountered some problems after local deployment (using Karmasphere, Eclipse).
All the IndividualPosting instances in the list in Posting class would be the same, even though I get different values in the Reduce method
From the documentation of ArrayWritable
:
A Writable for arrays containing instances of a class. The elements of this writable must all be instances of the same class. If this writable will be the input for a Reducer, you will need to create a subclass that sets the value to be of the proper type. For example:
public class IntArrayWritable extends ArrayWritable { public IntArrayWritable() { super(IntWritable.class); } }
You've already cited doing this with a WritableComparable
type defined by Hadoop. Here's what I assume your implementation looks like for LongWritable
:
public static class LongArrayWritable extends ArrayWritable
{
public LongArrayWritable() {
super(LongWritable.class);
}
public LongArrayWritable(LongWritable[] values) {
super(LongWritable.class, values);
}
}
You should be able to do this with any type that implements WritableComparable
, as given by the documentation. Using their example:
public class MyWritableComparable implements
WritableComparable<MyWritableComparable> {
// Some data
private int counter;
private long timestamp;
public void write(DataOutput out) throws IOException {
out.writeInt(counter);
out.writeLong(timestamp);
}
public void readFields(DataInput in) throws IOException {
counter = in.readInt();
timestamp = in.readLong();
}
public int compareTo(MyWritableComparable other) {
int thisValue = this.counter;
int thatValue = other.counter;
return (thisValue < thatValue ? -1 : (thisValue == thatValue ? 0 : 1));
}
}
And that should be that. This assumes you're using revision 0.20.2
or 0.21.0
of the Hadoop API.
精彩评论