Regex filter " with <> tags included
am having problems with some Regex code can anyone help.
I have the following string of data see below:
abcd " something code " nothing "f开发者_JS百科 <b> cannot find this section </b> "
I want to find the sections between "
quotes.
I can get if to work fine using the following regax:
foreach (Match match in Regex.Matches(sourceLine, @""((\\")|[^"(\\")])+""))
However, if section between the quotes contain <>
does not find the section. Not sure what to do to include the <>
tags in the regex.
Thanks for your time.
public List<string> Parse(string input)
{
List<string> results = new List<string>();
bool startSection = true;
int startIndex = 0;
foreach (Match m in Regex.Matches(input, @"(^|[^\\])(")"))
{
if (startSection)
{
startSection = false;
// capture a new section
startIndex = m.Index + """.Length;
}
else
{
// next match starts a new section to capture
startSection = true;
results.Add(input.Substring(startIndex, m.Index - startIndex + 1));
}
}
return results;
}
A character class […]
describes a set of allowed characters and a negated character class [^…]
describes a set of disallowed characters. So [^"(\\")]
means any character except &
, q
, u
, o
, t
, ;
, (
, \
, and )
. It does not mean anything but "(")
.
Try this instead:
"(.*?)"
Using the ungreedy quantifier *?
matches as little as possible in opposite to the greedy quantifier *
that matches as much as possible.
You can use HttpUtility.HtmlDecode to convert this text to normal characters. Then using a regex to extract text between the double quotes would be simple.
精彩评论