开发者

How to find and remove control characters in a text file

I've a .txt file which has control characters associated with email address.Some thing line this @ãgmail.com. There might be multiple instances of this and 开发者_运维技巧multiple instances of other control characters . Is there a way I can first find them and then remove them?


Here's a trick I picked up from devdaily.com:

tr -cd '\11\12\15\40-\176' < file-with-binary-chars > clean-file

This command deletes any character that is not a tab, line feed, carriage return, or in the range of printable ASCII characters (space through ~).

On Windows you can get the tr command from the GNU Utilities for Win32 or Cygwin.


a piece of c# code - not very optimized for large count of control characters. a hint for starting:

StreamReader sr = new StreamReader(@"c:\temp.data\big_file_with_unwanted_chars.txt", Encoding.Default);
StreamWriter sw = new StreamWriter(@"c:\temp.data\big_file_without_any_evil_chars.txt", false, Encoding.Default);

string al;

while (!sr.EndOfStream)
{
  al = sr.ReadLine();
  al = al.Replace("ä", "");
  al = al.Replace("#", "");
  sw.WriteLine(al);
}
sw.Close();
sr.Close();
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜