开发者

Kettle: Multiple putRows() in processRow() correctly?

I'm processing a /etc/group file from a system. I load it with CSV input step with the delimiter :. It has four fields: group,pwfield,gid,members. The members field is a comma separated list with account names of unspecified count from 0 to infinite.

I would like to produce a list of records with three fields: group开发者_Python百科,gid,account. In the first step I use User Defined Java Class, in the second I use Select values.

Example Input:

root:x:0:
first:x:100:joe,jane,zorro
second:x:101:steve

Example output (XLS) - expected:

group   gid account
first   100 joe
first   100 jane
first   100 zorro
second  101 steve

Example output (XLS) - actual, wrong:

group   gid account
first   100 zorro
first   100 zorro
first   100 zorro
second  101 steve

User Defined Java Class:

public boolean processRow(StepMetaInterface smi, StepDataInterface sdi) throws KettleException
{
    // boilerplate
    Object[] r = getRow();
    if (r == null) {
        setOutputDone();
        return false;
    }
    String tmp = get(Fields.In, "members").getString(r);
    if(null==tmp)
        return true;
    String accounts[] = tmp.split(",");
    for(int i=0; i<accounts.length; ++i){
        Object[] out_row = createOutputRow(r, data.outputRowMeta.size());
        String account = accounts[i];
        get(Fields.Out, "account").setValue(out_row,account);
        putRow(data.outputRowMeta, out_row);
    }

    return true;
}

I believe that I missed to call some administrative function, or I should use something other than createOutRow(). Google did not helped.

Kettle: Multiple putRows() in processRow() correctly?

Misterously if I create a transformation like the illustrated then

  • XLS debug A has correct account values in each row
  • XLS debug B has repeating account values like the example output.

If I place a Dummy step before Select values 7, the XLS debug B becomes correct and XLS debug A becomes bad.


The problem is with the following line (first line in the for loop):

Object[] out_row = createOutputRow(r, data.outputRowMeta.size());

It should be replaced with these three lines:

Object[] out_row = RowDataUtil.allocateRowData(data.outputRowMeta.size());
for (int j=0; j<r.length; ++j)
    out_row[j] = r[j];

UPDATE: A more easy way which is essentially the same:

Object[] out_row = RowDataUtil.createResizedCopy(r, data.outputRowMeta.size());
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜