开发者

Java Multi-line regex to replace multiple instances in a file

OK, so I have been searching for hours about my problem but nothing seems to come up. So here's my code snippet followed by the problem:

Pattern forKeys = Pattern.compile("^<feature>\\s*<name>Deviation</name>.*?</feature>", Pattern.DOTALL|Pattern.MULTILINE);
Matcher n = forKeys.matcher("");
String aLine = null;
    开发者_如何学运维while((aLine = in.readLine()) != null) {
         n.reset(aLine);
         String result = n.replaceAll("");
         out.write(result);
         out.newLine();
    }

let's just assume the undeclared variables are already declared..

my point is, my RegEx (and maybe the matcher also) is not working properly.

I want to erase the parts with the "<feature><name>Deviation</name>*any character/s here*</feature>" included in the ff lines:

<feature>
    <name>Deviation</name>
            <more words here>
</feature>
<feature>
    <name>Average</name>
</feature>
    <feature>
    <name>Deviation</name>
            sample words
</feature>

I think my problem is the use of repititive operators (how to traverse line breaks, tabs, etc), but I can't seem to find the correct expression.

Any ideas? Thanks in advance.


Parsing HTML or XML with regex is evil and error-prone.

Use an XML parser and things will work much better.
Here's a solution for your problem using Dom4J:

// parse XML source
Document document = DocumentHelper.parseText(yourXmlText);

Iterator<Element> featureIterator =
    // get an iterator for all <feature> elements
    document.getRootElement().elementIterator("feature");

while(featureIterator.hasNext()){
    Element featureElement = featureIterator.next();
    // if <feature> has a child <name> with Content "Deviation"
    if("Deviation").equals(featureElement.elementTextTrim("name")){
        // remove this <feature> element
        featureIterator.remove();
    }
}

// write modified XML back to file
new XMLWriter(
    new FileOutputStream(yourXmlFile), OutputFormat.createPrettyPrint()
).write(document);

Apart from that you are also making a mistake (see my comments):

// aLine is just a single line
while((aLine = in.readLine()) != null) {
     n.reset(aLine);
     // yet you want to replace a multi-line pattern
     String result = n.replaceAll("");
     out.write(result);
     out.newLine();
}

Your regex might or might not work if you read the entire file to a String, but it can't work if you apply it on individual lines.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜