Java text parsing help needed with section name/content text
I have text in the following format:
section name 1:
this text goes into the first section
section name 2:
this text goes into the second section
etc,
Where section names are arbitrary phrases and section contents will contain free text except section name. I need to split this text into object pairs of type (section name, section text)开发者_运维百科.
Is there an effective RegEx or other recommended way of doing this?
Thanks. -Raj
Well it depends on the structure of your document. For example, does each section have an empty line? If so, then it will be easy by just scanning line by line and just construct your object that way.
List<Section> sections = new ArrayList<Section>();
String temp = null;
String line = null;
int lineNumber = 0;
while ((line = br.readLine()) != null) {
lineNumber++;
if (lineNumber % 2 == 0) {
// Section Text
sections.add(new Section(temp, line);
}
else {
// Section Name
temp = line;
}
}
Then your Section might be:
public class Section {
private final String name;
private final String text;
public Section(String name, String text) {
this.name = name;
this.text = text;
}
}
You'll need a structure or a fixed, identifiable delimiter to decide whether a line contains a section name or a section body.
If you have a rule saying: a text line terminated with a colon is a section name, then you should read the document line by line, look for the last char in a line and treat the line (1) as a section head, if its last char is a colon or (2) as partof a section body otherwise.
精彩评论