开发者

Parsing a fixed-width formatted file in Java

I've got a file from a vendor that has 115 fixed-width fields per line. How can I parse that file into the 115 fields so I can use them in my code?

开发者_如何学Python

My first thought is just to make constants for each field like NAME_START_POSITION and NAME_LENGTH and using substring. That just seems ugly, so I'm curious about better ways of doing this. None of the couple of libraries a Google search turned up seemed any better, either.


I would use a flat file parser like flatworm instead of reinventing the wheel: it has a clean API, is simple to use, has decent error handling and a simple file format descriptor. Another option is jFFP but I prefer the first one.


I've played arround with fixedformat4j and it is quite nice. Easy to configure converters and the like.


uniVocity-parsers comes with a FixedWidthParser and FixedWidthWriter the can support tricky fixed-width formats, including lines with different fields, paddings, etc.

// creates the sequence of field lengths in the file to be parsed
FixedWidthFields fields = new FixedWidthFields(4, 5, 40, 40, 8);

// creates the default settings for a fixed width parser
FixedWidthParserSettings settings = new FixedWidthParserSettings(fields); // many settings here, check the tutorial.

//sets the character used for padding unwritten spaces in the file
settings.getFormat().setPadding('_');

// creates a fixed-width parser with the given settings
FixedWidthParser parser = new FixedWidthParser(settings);

// parses all rows in one go.
List<String[]> allRows = parser.parseAll(new File("path/to/fixed.txt")));

Here are a few examples for parsing all sorts of fixed-width inputs.

And here are some other examples for writing in general and other fixed-width examples specific to the fixed-width format.

Disclosure: I'm the author of this library, it's open-source and free (Apache 2.0 License)


Here is a basic implementation I use:

import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.io.OutputStream;
import java.io.OutputStreamWriter;
import java.io.Reader;
import java.io.Writer;

public class FlatFileParser {

  public static void main(String[] args) {
    File inputFile = new File("data.in");
    File outputFile = new File("data.out");
    int columnLengths[] = {7, 4, 10, 1};
    String charset = "ISO-8859-1";
    String delimiter = "~";

    System.out.println(
        convertFixedWidthFile(inputFile, outputFile, columnLengths, delimiter, charset)
        + " lines written to " + outputFile.getAbsolutePath());
  }

  /**
   * Converts a fixed width file to a delimited file.
   * <p>
   * This method ignores (consumes) newline and carriage return
   * characters. Lines returned is based strictly on the aggregated
   * lengths of the columns.
   *
   * A RuntimeException is thrown if run-off characters are detected
   * at eof.
   *
   * @param inputFile the fixed width file
   * @param outputFile the generated delimited file
   * @param columnLengths the array of column lengths
   * @param delimiter the delimiter used to split the columns
   * @param charsetName the charset name of the supplied files
   * @return the number of completed lines
   */
  public static final long convertFixedWidthFile(
      File inputFile,
      File outputFile,
      int columnLengths[],
      String delimiter,
      String charsetName) {

    InputStream inputStream = null;
    Reader inputStreamReader = null;
    OutputStream outputStream = null;
    Writer outputStreamWriter = null;
    String newline = System.getProperty("line.separator");
    String separator;
    int data;
    int currentIndex = 0;
    int currentLength = columnLengths[currentIndex];
    int currentPosition = 0;
    long lines = 0;

    try {
      inputStream = new FileInputStream(inputFile);
      inputStreamReader = new InputStreamReader(inputStream, charsetName);
      outputStream = new FileOutputStream(outputFile);
      outputStreamWriter = new OutputStreamWriter(outputStream, charsetName);

      while((data = inputStreamReader.read()) != -1) {
        if(data != 13 && data != 10) {
          outputStreamWriter.write(data);
          if(++currentPosition > (currentLength - 1)) {
            currentIndex++;
            separator = delimiter;
            if(currentIndex > columnLengths.length - 1) {
              currentIndex = 0;
              separator = newline;
              lines++;
            }
            outputStreamWriter.write(separator);
            currentLength = columnLengths[currentIndex];
            currentPosition = 0;
          }
        }
      }
      if(currentIndex > 0 || currentPosition > 0) {
        String line = "Line " + ((int)lines + 1);
        String column = ", Column " + ((int)currentIndex + 1);
        String position = ", Position " + ((int)currentPosition);
        throw new RuntimeException("Incomplete record detected. " + line + column + position);
      }
      return lines;
    }
    catch (Throwable e) {
      throw new RuntimeException(e);
    }
    finally {
      try {
        inputStreamReader.close();
        outputStreamWriter.close();
      }
      catch (Throwable e) {
        throw new RuntimeException(e);
      }
    }
  }
}


Most suitable for Scala, but probably you could use it in Java

I was so fed up with the fact that there is no proper library for fixed length format that I have created my own. You can check it out here: https://github.com/atais/Fixed-Length

A basic usage is that you create a case class and it's described as an HList (Shapeless):

case class Employee(name: String, number: Option[Int], manager: Boolean)

object Employee {

    import com.github.atais.util.Read._
    import cats.implicits._
    import com.github.atais.util.Write._
    import Codec._

    implicit val employeeCodec: Codec[Employee] = {
      fixed[String](0, 10) <<:
        fixed[Option[Int]](10, 13, Alignment.Right) <<:
        fixed[Boolean](13, 18)
    }.as[Employee]
}

And you can easily decode your lines now or encode your object:

import Employee._
Parser.decode[Employee](exampleString)
Parser.encode(exampleObject)


If your string is called inStr, convert it to a char array and use the String(char[], start, length) constructor

char[] intStrChar = inStr.toCharArray();
String charfirst10 = new String(intStrChar,0,9);
String char10to20 = new String(intStrChar,10,19);


The Apache Commons CSV project can handle fixed with files.

Looks like the fixed width functionality didn't survive promotion from the sandbox.


Here is the plain java code to read fixedwidth file:

import java.io.File;
import java.io.FileNotFoundException;
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.util.Arrays;
import java.util.List;

public class FixedWidth {

    public static void main(String[] args) throws FileNotFoundException, IOException {
        // String S1="NHJAMES TURNER M123-45-67890004224345";
        String FixedLengths = "2,15,15,1,11,10";

        List<String> items = Arrays.asList(FixedLengths.split("\\s*,\\s*"));
        File file = new File("src/sample.txt");

        try (BufferedReader br = new BufferedReader(new FileReader(file))) {
            String line1;
            while ((line1 = br.readLine()) != null) {
                // process the line.

                int n = 0;
                String line = "";
                for (String i : items) {
                    // System.out.println("Before"+n);
                    if (i == items.get(items.size() - 1)) {
                        line = line + line1.substring(n, n + Integer.parseInt(i)).trim();
                    } else {
                        line = line + line1.substring(n, n + Integer.parseInt(i)).trim() + ",";
                    }
                    // System.out.println(
                    // S1.substring(n,n+Integer.parseInt(i)));
                    n = n + Integer.parseInt(i);
                    // System.out.println("After"+n);
                }
                System.out.println(line);
            }
        }

    }

}


/*The method takes three parameters, fixed length record , length of record which will come from schema , say 10 columns and third parameter is delimiter*/
public class Testing {

    public static void main(String as[]) throws InterruptedException {

        fixedLengthRecordProcessor("1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10", 10, ",");

    }

    public static void fixedLengthRecordProcessor(String input, int reclength, String dilimiter) {
        String[] values = input.split(dilimiter);
        String record = "";
        int recCounter = 0;
        for (Object O : values) {

            if (recCounter == reclength) {
                System.out.println(record.substring(0, record.length() - 1));// process
                                                                                // your
                                                                                // record
                record = "";
                record = record + O.toString() + ",";
                recCounter = 1;
            } else {

                record = record + O.toString() + ",";

                recCounter++;

            }

        }
        System.out.println(record.substring(0, record.length() - 1)); // process
                                                                        // your
                                                                        // record
    }

}


Another library that can be used to parse a fixed width text source: https://github.com/org-tigris-jsapar/jsapar

Allows you to define a schema in xml or in code and parse fixed width text into java beans or fetch values from an internal format.

Disclosure: I am the author of the jsapar library. If it does not fulfill your needs, on this page you can find a comprehensive list of other parsing libraries. Most of them are only for delimited files but some can parse fixed width as well.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜