开发者

Implementation of an ArrayWritable for a custom Hadoop type

How do I define an ArrayWritable for a custom Hadoop type ? I am trying to implement an inverted index in Hadoop, with custom Hadoop types to store the data

I have an Individual Posting class which stores the term frequency, document id and list of byte offsets for the term in the document.

I have a Posting class which has a document frequency (number of documents the term appears in) and list of Individual Postings

I have d开发者_运维百科efined a LongArrayWritable extending the ArrayWritable class for the list of byte offsets in IndividualPostings

When i defined a custom ArrayWritable for IndividualPosting I encountered some problems after local deployment (using Karmasphere, Eclipse).

All the IndividualPosting instances in the list in Posting class would be the same, even though I get different values in the Reduce method


From the documentation of ArrayWritable:

A Writable for arrays containing instances of a class. The elements of this writable must all be instances of the same class. If this writable will be the input for a Reducer, you will need to create a subclass that sets the value to be of the proper type. For example: public class IntArrayWritable extends ArrayWritable { public IntArrayWritable() { super(IntWritable.class); } }

You've already cited doing this with a WritableComparable type defined by Hadoop. Here's what I assume your implementation looks like for LongWritable:

public static class LongArrayWritable extends ArrayWritable
{
    public LongArrayWritable() {
        super(LongWritable.class);
    }
    public LongArrayWritable(LongWritable[] values) {
        super(LongWritable.class, values);
    }
}

You should be able to do this with any type that implements WritableComparable, as given by the documentation. Using their example:

public class MyWritableComparable implements
        WritableComparable<MyWritableComparable> {

    // Some data
    private int counter;
    private long timestamp;

    public void write(DataOutput out) throws IOException {
        out.writeInt(counter);
        out.writeLong(timestamp);
    }

    public void readFields(DataInput in) throws IOException {
        counter = in.readInt();
        timestamp = in.readLong();
    }

    public int compareTo(MyWritableComparable other) {
        int thisValue = this.counter;
        int thatValue = other.counter;
        return (thisValue < thatValue ? -1 : (thisValue == thatValue ? 0 : 1));
    }
}

And that should be that. This assumes you're using revision 0.20.2 or 0.21.0 of the Hadoop API.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜