.Net Regular Expression to get parenthetical text at end of <p> tags
I have a simple pattern I am trying to match, any characters captured between parenthesis at the end of an HTML paragraph. I am running into trouble any time there is additional parentheticals in that paragraph:
i.e.
If the input string is "..... (321)</p>" i want to get the value (321)
However, if the paragraph has this text: "... (123) (321)</p>" my regex is returning "(123) (321)" (everything between the opening "(" and closing ")"
I am using the regex pattern "\s(.+)</p>"
How can I grab the correct value (using VB.NET)
This is what I'm doing so far:
Dim reg As New Regex("\s\(.+\)</P>", RegexOptions.IgnoreCase)
Dim matchC As MatchCollection = reg.Matches(su.Question)
If matchC.Count > 0 Then
Dim lastMatch As Match = matchC(matchC.Count - 1)
Dim DesiredValue As String = lastMatch.Value
End If开发者_运维技巧
Just change the expression to non-greedy and reverse the match order:
Dim reg As New Regex("\s\(.+?\)</P>", RegexOptions.IgnoreCase Or RegexOptions.RightToLeft)
Or make it match only one closing parenthesis:
"\s\([^)]+\)</P>"
Or make it match only numbers inside your pharentesis:
"\s\(\d+\)</P>"
Edit: in order to make the non-greedy sample to work, you'll need to set the RightToLeft flag on the Regex
object
Dim reg As New Regex("\s\(\d+\)</P>", RegexOptions.IgnoreCase)
Your stumbling block was the insufficient specificity of the .
(it matches all characters, including parentheses) and the greediness of the +
(it matches as much as possible).
Just be more specific (\d+
) or less greedy (.+?
).
You need to use a Look Ahead (?= ) to anchor the pattern. That gives a hint to the parser of where the data should stop, be anchored to. Here is an example which gets the previous ( ) data from the p tag anchor point:
(?:\()([^)]+)(?:\))(?=</[pP]>)
(?:\() - Match but don't capture a (
([^)]+) - Get all the data until a ) is hit. [^ ] is the not set
(?:\)) - Match but don't capture the )
(?=</[pP]>) - Look Ahead Match but don't capture a suffix of </p or P >
HTH
精彩评论