C# Regex optional match
I have some page content which contains multiple occurrences of the below line of code:
<li class="r"><h3><a href="/test-url.htm">test string</a></h3></li>
I'm using .NET Regex to find all the occurrences in the content and return me the href of the anchor tag.
My problem is that sometimes the <li>
has quotes wrapped around the class (as shown above) but others don't and just have: class=r
I need the match against both with and without quotes.
I've tried various methods but nothing seems to have worked yet. They 开发者_如何学JAVAall match when there is a quote, but not without a quote. Below is my current attempt:
Regex _Regex = new Regex(@"<li class=(?:"")g([^>])*>((?!<h3).)*<h3([^>])*><a\shref=""(?<URL>[^""]*)""([^>])*>((?!</li).)*", RegexOptions.IgnoreCase);
Any help is much appreciated,
Thanks.
I think the format you want is
""?
Not
?:
The question mark marks the preceding char as optional.
The trick is to match and capture an optional first quote, so the group ends up containing either a quote or an empty string. Then you use a backreference at the end of the word to match the same thing again.
@"<li class=(""?)r\1[^>]*>"
On a side note, this appears three times in your regex, and it's wrong: ([^>])*
. It matches what you want it to, but it only captures the last character. If you need to capture those segments, you should move the asterisk inside the group. If you don't need to capture it, just get rid of the parentheses like I did.
Here is part of the regex. I think you know how to finigh it:
<li class=["r]+?>
or
<li class=["]?r["]?>
both of them works.
精彩评论