Question on Whitespace Filter Regex (it's simple, just a small addition needed)

2022-12-08 13:10 问答作者：

I have a Regex based whitespace filter on an ASP.NET MVC application, and it works perfectly, too perfectly. One of the things that gets filtered are the \r\n characters. This effectively makes everything in one line of source code, which I love because I don't have to deal with quirky CSS because of the whitespace, but in certain instances I need to retain them. One example is when I want to literraly display text with line breaks in it, such as a note.

To do so, I would obviously wrap it in <pre></pre> tags, but because of the filter the linebreaks of text in between the tags also gets scrubbed, so it makes a note for example rather difficult to read.

Can anyone with Regex knowledge (mine is very poor...) help me in modifying the current Regex to ignore text between the <pre> tags?

Here's the current code:

public class WhitespaceFilter : MemoryStream {
    private string Source = string.Empty;
    private Stream Filter = null;

    public WhitespaceFilter(HttpResponseBase HttpResponseBase) {
        Filter = HttpResponseBase.Filter;
    }

    public override void Write(byte[] buffer, int offset, int count) {
        Source = UTF8Encoding.UTF8.GetString(buffer);

        Source = new Regex("\\t", RegexOptions.Compiled | RegexOptions.Mul开发者_运维百科tiline).Replace(Source, string.Empty);
        Source = new Regex(">\\r\\n<", RegexOptions.Compiled | RegexOptions.Multiline).Replace(Source, "><");
        Source = new Regex("\\r\\n", RegexOptions.Compiled | RegexOptions.Multiline).Replace(Source, string.Empty);

        while (new Regex("  ", RegexOptions.Compiled | RegexOptions.Multiline).IsMatch(Source)) {
            Source = new Regex("  ", RegexOptions.Compiled | RegexOptions.Multiline).Replace(Source, string.Empty);
        };

        Source = new Regex(">\\s<", RegexOptions.Compiled | RegexOptions.Multiline).Replace(Source, "><");
        Source = new Regex("<!--.*?-->", RegexOptions.Compiled | RegexOptions.Singleline).Replace(Source, string.Empty);

        Filter.Write(UTF8Encoding.UTF8.GetBytes(Source), offset, UTF8Encoding.UTF8.GetByteCount(Source));
    }
}

Thanks in advance!

There are tools like htmlcompressor already out there to strip whitespace. And like exhuma said, if this is for web optimization then gzip compression would help more than anything if you configured it on the web server.

As for your original question, there a lot of different ways to do this. You could also attack the problem with something like XPATH (if the HTML is valid XHTML) and then combine that with regex. But I figured I'd try my hand at writing a single regex to do it:

(<pre>[^<>]*(((?<Open><)[^<>]*)+((?<Close-Open>>)[^<>]*)+)*(?(Open)(?!))</pre>)|[\n\r]

It seems to work for me. Fortunately .NET has an extremely powerful regex engine including a very cool balanced matching feature. I can't explain it any better than Ryan Byington can. But the idea is to match the beginning and ending pre tags first and make sure everything inside is untouched. Then everything around those pre tags gets the rest of the regex applied, "[\n\r]".

To make this work you'd simply do this:

Source = new Regex("(<pre>[^<>]*(((?<Open><)[^<>]*)+((?<Close-Open>>)[^<>]*)+)*(?(Open)(?!))</pre>)|[\n\r]", 
  RegexOptions.Compiled | RegexOptions.Singleline).Replace(Source, "$1");

Note the $1 at the end. This is the part that grabs the results from inside the pre tags and returns them untouched.

Then after that write another line to replace \s\s+ with a single space. I think that should work pretty well.

继续阅读：asp.net-mvc regex

Question on Whitespace Filter Regex (it's simple, just a small addition needed)

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？