开发者

Replace all occurrences and regular expression

I'm trying replace two consecutive line breaks with the HTML tag <p/>. So in a string such as:

\r\n\r\n\r\n
开发者_开发知识库

There are two consecutive occurrences of \r\n\r\n,

The result should be:

<p/><p/>

but with C# String.Replace, it only detects the first occurrence and I just get back:

<p/>\r\n

So I'm wondering if any regular expression gurus know how to detect that using regular expression?

Edit:

I figured the question is a bit confusing. Let me rephrase it. The requirement should be to replace any "\r\n" with a tag <p/> only if there is another "\r\n" immediately before it.

Such with the string:

\r\n\r\n\r\n
  • The first \r\n, does not have another \r\n before it, nothing should be done,
  • The second \r\n, it does have another \r\n before it, qualifies for replacement,
  • The third \r\n, it does have another \r\n before it, also qualifies for replace.

So the result should be:

<p/><p/>


Yes, you can do this with a regular expression:

string tidyString = Regex.Replace(originalString, @"(?<=\r\n)\r\n", "<p/>");

If performance is an issue then you might find that rebuilding the string manually is quicker than a regex, but the drawback will be the more complicated code. (I'd probably go for the regex in 99% of situations.)

var sb = new StringBuilder(originalString.Length);
int startIndex = 0, nextIndex;
while ((nextIndex = originalString.IndexOf("\r\n", startIndex)) >= 0)
{
    if ((nextIndex == startIndex) && (startIndex > 0))
        sb.Append("<p/>");
    else
        sb.Append(originalString, startIndex, nextIndex - startIndex + 2);

    startIndex = nextIndex + 2;
}
sb.Append(originalString, startIndex, originalString.Length - startIndex);

string tidyString = sb.ToString();


You have said this in a comment:

What I want to do is, if I detect a "\r\n" right before a "\r\n", it should be replace with the tag "p".

This can be rephrased:

What I want to do is, if I detect N consecutive occurrences of "\r\n" where N > 1, I want to replace these with N - 1 occurrences of the tag "p".

Don't believe me?

Let's say we have some fence posts:

|  |  |  |  |

If I say, "I want to detect wherever the is a fence post right before another fence post," what am I really saying? I'm saying I want to make N - 1 detections wherever there are N consecutive fence posts (again: when N > 1).

Look above; there are five fence posts, but four occurrences such as what you've described. The same holds true for detecting consecutive occurrences of "\r\n" after the first occurrence.


It sounds like rather than replace the string "\r\n\r\n" with "<p/>", you want to replace all occurrences of "\r\n" after the first occurrence with "<p/>", stripping the first occurrence out. (Your example does not contain two consecutive occurrences of "\r\n\r\n"; it contains three consecutive occurrences of only "\r\n".)

If this is exactly what you want, I'd recommend simply doing what I just described: detecting a "\r\n" and replacing subsequent "\r\n" occurrences in the string with "<p/>" (until reaching a block of text that is different).

Let me know if you need any help doing that. It's also possible I've misunderstood you, of course; but this is what it looks like to me you are trying to accomplish.

Hint: A very easy but inefficient way of doing this would involve calling string.Split with the string "\r\n" as the delimiter and using the resulting array to reconstruct the string.

Actually, I don't think the above hint will quite get you there, even as an inefficient solution. There are obviously other ways, though.


Try this. I had to use a 2nd replace method to cater for the 3rd \r\n occurrence.

var input = "Line1\r\nLine2\r\n\r\n\r\n<line5>";
var result = Regex.Replace(input, "(\\r\\n){2}", "<p/>");
result = Regex.Replace(result, "<p/>\\r\\n", "</p></p>");

//result equals: "Line1\r\nLine2</p></p><line5>"


"\r\n\r\n\r\n" isn't really "two consecutive occurrences of "\r\n\r\n"". Yes, chars 1-4 and chars 3-6 are both the string "\r\n\r\n", but they intersect one another. String.Replace cannot take this into account. It simply scans the string until it finds something to replace, does the replacement, and then continues scanning the string from that point on. It doesn't scan the whole string and identify the various substrings that are to be replaced and then replaces each of them with the specified string.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜