开发者

Regex pattern that will replace \r\n and spaces between >< excluding the spaces between span tag

I want to replace the \r\n and all the white spaces between the tags[eg : ><] but excluding th spaces between the t开发者_开发百科ag.

<html>\r\n  <body>\r\n    
<p>\r\n      
<input name=\"Directory\" style=\"font-size:11;font-weight:normal;font-style:normal;color:#FF406080\" />\r\n      <span style=\"font-size:11;font-weight:normal;font-style:normal;color:#FF406080\">\r\n  </span>\r\n    
</p>\r\n    
<p>\r\n      
<span style=\"font-size:11;font-weight:normal;font-style:normal;color:#FF406080\"> </span>\r\n      <input name=\"FileName\" style=\"font-size:11;font-weight:normal;font-style:normal;color:#FF406080\" />\r\n       <span style=\"font-size:11;font-weight:normal;font-style:normal;color:#FF406080\"></span>\r\n    </p>\r\n  </body>\r\n</html>

Edit : The above is just an example of the html string how i am getting. I tried myself writting an regex pattern for it :

private static readonly Regex REGEX_FOR = new Regex(@"(?<!></span)>\\r\\n|[\s]*<");

New Edit :

I also dont want to replace /r/n before

. That is i want them for the line break between my paragraphs tag. I want my output to be this :

<html><body>  
<p>     
<input name=\"Directory\" style=\"font-size:11;font-weight:normal;font-style:normal;color:#FF406080\" />\r\n      <span style=\"font-size:11;font-weight:normal;font-style:normal;color:#FF406080\">\r\n  </span>\r\n    
</p>
\r\n    
<p>    
<span style=\"font-size:11;font-weight:normal;font-style:normal;color:#FF406080\"> </span><input name=\"FileName\" style=\"font-size:11;font-weight:normal;font-style:normal;color:#FF406080\" />
<span style=\"font-size:11;font-weight:normal;font-style:normal;color:#FF406080\"></span>
</p>
</body>
</html>


As has already been stated, for reqex queries, it's best to provide an example of the required output rather than a fairly vague description. That said, the expressions below should sort out what you need.

Search Expression: >(\r\n\s+) <

Replace Expression: > <

The \s token will match any white space, and you can safely drop the \r\n and use just \s in order to do the matching, but the expression above will enforce a new line is the start of any match pattern (assuming that is what is needed).

Then just add any back into the span tags as needed:

Search Expression (<span [^>]+>)(</span>)

Replace Expression: $1 $2


have a look at this online regextester where I entered your example

try this regex:

string.replaceAll("\\r\\n[ \\t]*"," ")

note:

  • this removes newline and following optional spaces. As long as you have no newline between span the spaces are not replaced there.

  • I think it's more save to replace the whitespace with single space instead of blank.

  • you could add some regex lookaround if needed.
    e.g negative lookahead meaning "same regex as before not followed by </span>"
    string.replaceAll("\\r\\n[ \\t]*(?!</span>)"," ")

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜