Java Multi-line regex to replace multiple instances in a file
OK, so I have been searching for hours about my problem but nothing seems to come up. So here's my code snippet followed by the problem:
Pattern forKeys = Pattern.compile("^<feature>\\s*<name>Deviation</name>.*?</feature>", Pattern.DOTALL|Pattern.MULTILINE);
Matcher n = forKeys.matcher("");
String aLine = null;
开发者_如何学运维while((aLine = in.readLine()) != null) {
n.reset(aLine);
String result = n.replaceAll("");
out.write(result);
out.newLine();
}
let's just assume the undeclared variables are already declared..
my point is, my RegEx (and maybe the matcher also) is not working properly.
I want to erase the parts with the "<feature><name>Deviation</name>*any character/s here*</feature>
" included in the ff lines:
<feature>
<name>Deviation</name>
<more words here>
</feature>
<feature>
<name>Average</name>
</feature>
<feature>
<name>Deviation</name>
sample words
</feature>
I think my problem is the use of repititive operators (how to traverse line breaks, tabs, etc), but I can't seem to find the correct expression.
Any ideas? Thanks in advance.
Parsing HTML or XML with regex is evil and error-prone.
Use an XML parser and things will work much better.
Here's a solution for your problem using Dom4J:
// parse XML source
Document document = DocumentHelper.parseText(yourXmlText);
Iterator<Element> featureIterator =
// get an iterator for all <feature> elements
document.getRootElement().elementIterator("feature");
while(featureIterator.hasNext()){
Element featureElement = featureIterator.next();
// if <feature> has a child <name> with Content "Deviation"
if("Deviation").equals(featureElement.elementTextTrim("name")){
// remove this <feature> element
featureIterator.remove();
}
}
// write modified XML back to file
new XMLWriter(
new FileOutputStream(yourXmlFile), OutputFormat.createPrettyPrint()
).write(document);
Apart from that you are also making a mistake (see my comments):
// aLine is just a single line
while((aLine = in.readLine()) != null) {
n.reset(aLine);
// yet you want to replace a multi-line pattern
String result = n.replaceAll("");
out.write(result);
out.newLine();
}
Your regex might or might not work if you read the entire file to a String, but it can't work if you apply it on individual lines.
精彩评论