开发者

How to remove extra returns and spaces in a string by regex?

I convert a HTML code to plain text.But there are many extra returns and spaces.How to r开发者_如何学Pythonemove them?


string new_string = Regex.Replace(orig_string, @"\s", "") will remove all whitespace

string new_string = Regex.Replace(orig_string, @"\s+", " ") will just collapse multiple whitespaces into one


I'm assuming that you want to

  • find two or more consecutive spaces and replace them with a single space, and
  • find two or more consecutive newlines and replace them with a single newline.

If that's correct, then you could use

resultString = Regex.Replace(subjectString, @"( |\r?\n)\1+", "$1");

This keeps the original "type" of whitespace intact and also preserves Windows line endings correctly. If you also want to "condense" multiple tabs into one, use

resultString = Regex.Replace(subjectString, @"( |\t|\r?\n)\1+", "$1");

To condense a string of newlines and spaces (any number of each) into a single newline, use

resultString = Regex.Replace(subjectString, @"(?:(?:\r?\n)+ +){2,}", @"\n");


I used a lot of algorithm for that. Every loop was good but this was clear and absolute.

//define what you want to remove as char

char tb = (char)9; //Tab char ascii code
spc = (char)32;    //space char ascii code
nwln = (char)10;   //New line char ascii char

yourstring.Replace(tb,"");
yourstring.Replace(spc,"");
yourstring.Replace(nwln,"");

//by defining chars, result was better.


You can use Trim() to remove the spaces and returns. In HTML the spaces is not important so you can omit them by using the Trim() method in System.String class.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜