开发者

How to replace special characters using regular expressions?

How to replace special characters using regular expressions? By special, what I mean is those symbolic characters that appear someti开发者_运维知识库mes in text.

For example, in text below, I want to remove the bubble which is at the start of each line.

Passport Details

Name as on passport
Relationship
Passport Number
Date of Issue
Expiry Date
Place of Issue

Question edited : Sorry, the bubble at the start of line is no more visible.After submitting question, stackoverflow removed that special character.

Anyone knows how to replace those special characters? I dont want to replace characters like #, @ or !. These are trivial and can be typed with keyboard.

Sorry, I dont know how to put those special characters in my question.I will try to explain. In word file, we put bullets before text. I want to replace characters reprenting such characters. I have some text files which contain characters which look like bubble.

Finally, I found out the solution. This regular expression works for me

([^(A-Za-z0-9)+|\r|\n|\t|'|"|#|;|:|/|\|.|,| ])


(This was posted before the language had been specified.)

To replace non-ascii characters with a space in Perl,

 $string =~ s/[^[:ascii:]]/ /g;

See http://codepad.org/KTMvQiOz . Here the [^[:ascii:]] is a regex which matches any non-ascii character.


It would be possible to find all "special" characters with this regular expression and then just replace them with a space character:

/[<special_characters_here>]/

However, usually it is better to use whitelisting, thus mentioning all allowed characters and replacing everything that's not them with a space character:

/[^<allowed_characters_here>]/


I don't have enough time to flesh out a full example. But since you're using .NET you can match on any number of these character classes:

http://msdn.microsoft.com/en-us/library/20bw873z.aspx

Choose what you want to accept and replace anything that is not equal to that set.


Do you mean replacing the carriage return and new line characters?

If that's what you're after, this would do it:

var source = "once\r\ntwice\r\nthrice";
var pattern = new Regex(@"\r\n");
var result = pattern.Replace(source, ",");
Assert.AreEqual("once,twice,thrice", result);
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜