开发者

Java text parsing help needed with section name/content text

I have text in the following format:

section name 1:

this text goes into the first section

section name 2:

this text goes into the second section

etc,

Where section names are arbitrary phrases and section contents will contain free text except section name. I need to split this text into object pairs of type (section name, section text)开发者_运维百科.

Is there an effective RegEx or other recommended way of doing this?

Thanks. -Raj


Well it depends on the structure of your document. For example, does each section have an empty line? If so, then it will be easy by just scanning line by line and just construct your object that way.

List<Section> sections = new ArrayList<Section>();
String temp = null;
String line = null;
int lineNumber = 0;

while ((line = br.readLine()) != null) {
  lineNumber++;
  if (lineNumber % 2 == 0) {
    // Section Text
    sections.add(new Section(temp, line);
  }
  else {
    // Section Name
    temp = line;
  }
}

Then your Section might be:

public class Section {
  private final String name;
  private final String text;
  public Section(String name, String text) {
    this.name = name;
    this.text = text;
  }
}


You'll need a structure or a fixed, identifiable delimiter to decide whether a line contains a section name or a section body.

If you have a rule saying: a text line terminated with a colon is a section name, then you should read the document line by line, look for the last char in a line and treat the line (1) as a section head, if its last char is a colon or (2) as partof a section body otherwise.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜