开发者

Parsing multi-line fixed-width files

I have a fixed-width flat file. To make matters worse, each line can either be a new record or a subrecord of the line above, identified by the first character on each line:

A0020SOME DESCRIPTION   MORE DESCRIPTION 922 2321      # Separate
A0021ANOTHER DESCRIPTIONMORE DESCRIPTION 23111442      # records
B0021ANOTHER DESCRIPTION   THIS TIME IN ANOTHER FORMAT # sub-record of record "0021"

I've tried using Flatworm which seems to be an excellent library for parsing fixed-width data. It's documentation, unfortunately, states:

"Repeating segments are supported only for delimited files"

(ibid, "Repeating segments").

I'd rather not write a custom parser for this. Is it (1) possible to do this in Flatworm or (2) is there a library providing such (multi-line,开发者_运维问答 multi-sub-record) capabilities?


Have you looked at JRecordBind?

http://jrecordbind.org/

"JRecordBind supports hierarchical fixed length files: records of some type that are 'sons' of other record types."


Check Preon. Although Preon is targeting bitstream compressed data, you might be able to twist its arm and use it for the file format you identified as well. The benefit of using Preon would be that it will generate human-readable documentation as well.


With uniVocity-parsers you can not only read fixed-width inputs, but you can also read master-detail rows (in which a row has sub-rows).

Here's an example:

//1st, use a RowProcessor for the "detail" rows.
ObjectRowListProcessor detailProcessor = new ObjectRowListProcessor();

//2nd, create MasterDetailProcessor to identify whether or not a row is the master row.
// the row placement argument indicates whether the master detail row occurs before or after a sequence of "detail" rows.
MasterDetailListProcessor masterRowProcessor = new MasterDetailListProcessor(RowPlacement.TOP, detailProcessor) {
    @Override
    protected boolean isMasterRecord(String[] row, ParsingContext context) {
        //Returns true if the parsed row is the master row.
        return row[0].startsWith("B");
    }
};

FixedWidthParserSettings parserSettings = new FixedWidthParserSettings(new FixedWidthFieldLengths(4, 5, 40, 40, 8));

// Set the RowProcessor to the masterRowProcessor.
parserSettings.setRowProcessor(masterRowProcessor);

FixedWidthParser parser = new FixedWidthParser(parserSettings);
parser.parse(new FileReader(yourFile));

// Here we get the MasterDetailRecord elements.
List<MasterDetailRecord> rows = masterRowProcessor.getRecords();
for(MasterDetailRecord masterRecord = rows){
 // The master record has one master row and multiple detail rows.
    Object[] masterRow = masterRecord.getMasterRow();
    List<Object[]> detailRows = masterRecord.getDetailRows();
}

Disclosure: I am the author of this library. It's open-source and free (Apache V2.0 license).

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜