regex matching an open and close tag and a certain text patterns inside that tag [duplicate]
Here is a sample custom tag i have from a sitemap.xml
<url>
<loc>http://sitename.com/programming/php/?C=D;O=A</loc>
<changefreq>weekly</changefreq>
<priority>0.64</priority>
</url>
There are many entries like this and if you see loc tag it has c=d;0=a at the end.
I want to remove all entries 开发者_StackOverflow社区starting with <url>
ending with </url>
which contains C=D;0=A or similar patterns like that.
The following expression matched the whole of the above specified tag
<url>(.|\r\n)*?<\/url>
but I want to match like what i had specified in the above statement.
How do we form regex to match such conditions(patterns) ?
Try this:
/<url>(?:(?!<\/url>).)*C=D;O=A.*?<\/url>/m
The negative lookahead guaranties that you do not match multiple nodes.
See here: rubular
It is not a good idea to use regex for XML. Depending on the language you should use some XML reader, extract the <url>
node and then use regex to match the content of the node.
One useful language for querying XML data, which is supported by many XML libraries is XPath.
If you absolutely have to use regex, this one:
<([a-z][a-z0-9]*)\b[^>]*>(.*?)(C=D;O=A){1}(.*?)</\1>
will get you the line:
http://sitename.com/programming/php/?C=D;O=A
I would then traverse up to the parent tag and do whatever I wanted with it.
精彩评论