Regular expression to remove <br> from <pre>
I am trying to remove the <br />
tags that appear in between the <pre></pre>
tags. My string looks like
string str = "Test<br/><pre><br/>Test<br/></pre><br/>Test<br/>---<br/>Test<br/><pre><br/>Test<br/></pre><br/>Test"
string temp = "`##`";
while (Regex.IsMatch(result, @"\<pre\>(.*?)\<br\>(.*?)\</pre\>", RegexOptions.IgnoreCase))
{
result = System.Text.RegularExpressions.Regex.Replace(result, @"\<pre\>(.*?)\<br\>(.*?)\</pre\>", "<pre>$1" + temp + "$2</pre>", RegexOptions.IgnoreCase);
}
str = str.Replace(temp, System.Environment.NewL开发者_JS百科ine);
But this replaces all <br>
tags between first and the last <pre>
in the whole text. Thus my final outcome is:
str = "Test<br/><pre>\r\nTest\r\n</pre>\r\nTest\r\n---\r\nTest\r\n<pre>\r\nTest\r\n</pre><br/>Test"
I expect my outcome to be
str = "Test<br/><pre>\r\nTest\r\n</pre><br/>Test<br/>---<br/>Test<br/><pre>\r\nTest\r\n</pre><br/>Test"
If you are parsing whole HTML pages, RegEx is not a good choice - see here for a good demonstration of why.
Use an HTML parser such as the HTML Agility Pack for this kind of work. It also works with fragments like the one you posted.
Don't use regex to do it.
"Be lazy, use CPAN and use HTML::Sanitizer." -Jeff Atwood, Parsing Html The Cthulhu Way
string input = "Test<br/><pre><br/>Test<br/></pre><br/>Test<br/>---<br/>Test<br/><pre><br/>Test<br/></pre><br/>Test";
string pattern = @"<pre>(.*)<br/>(([^<][^/][^p][^r][^e][^>])*)</pre>";
while (Regex.IsMatch(input, pattern))
{
input = Regex.Replace(input, pattern, "<pre>$1\r\n$2</pre>");
}
this will probably work, but you should use html agility pack, this will not match <br>
or <br />
etc.
Ok. So I discovered the issue with my code. The problem was that, Regex.IsMatch was considering just the first occurrence of <pre>
and the last occurrence of </pre>
. I wanted to consider individual sets of <pre>
for replacements. So I modified my code as
foreach (Match regExp in Regex.Matches(str, @"\<pre\>(.*?)\<br\>(.*?)\</pre\>", RegexOptions.IgnoreCase))
{
matchFound = true;
str = str.Replace(regExp.Value, regExp.Value.Replace("<br>", temp));
}
and it worked well. Anyways thanks all for your replies.
精彩评论