开发者

Regular expression: find spaces (tabs/space), but not newlines

How can I have a regular express开发者_JS百科ion that tests for spaces or tabs, but not newlines?

I tried \s, but I found out that it tests for newlines too.

I use C# (.NET) and WPF, but it shouldn't matter.


Use character classes: [ \t]


Try this character set:

[ \t]

This does only match a space or a tabulator.


As Eiríkr Útlendi noted, the accepted solution only considers two white space characters: the horizontal tab (U+0009), and a breaking space (U+0020). It does not consider other white space characters such as non-breaking spaces (which happen to be in the text I am trying to deal with).

A more complete white space character listing is included on Wikipedia and also referenced in the linked Perl answer. A simple C# solution that accounts for these other characters can be built using character class subtraction:

[\s-[\r\n]]

Or, including Eiríkr Útlendi's solution, you get

[\s\u3000-[\r\n]]


Note: For those dealing with CJK text (Chinese, Japanese, and Korean), the double-byte space (Unicode \u3000) is not included in \s for any implementation I've tried so far (Perl, .NET, PCRE, and Python). You'll need to either normalize your strings first (such as by replacing all \u3000 with \u0020), or you'll have to use a character set that includes this code point in addition to whatever other white space you're targeting, such as [ \t\u3000].

If you're using Perl or PCRE, you have the option of using the \h shorthand for horizontal whitespace, which appears to include the single-byte space, double-byte space, and tab, among others. See the Match whitespace but not newlines (Perl) question for more detail.

However, this \h shorthand has not been implemented for .NET and C#, as best I've been able to tell.


If you want to replace space, the below code worked for me in C#.

Regex.Replace(Line, "\\\s", "");

For Tab

Regex.Replace(Line, "\\\s\\\s", "");
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜