How to find and remove control characters in a text file
I've a .txt file which has control characters associated with email address.Some thing line this @ãgmail.com. There might be multiple instances of this and 开发者_运维技巧multiple instances of other control characters . Is there a way I can first find them and then remove them?
Here's a trick I picked up from devdaily.com:
tr -cd '\11\12\15\40-\176' < file-with-binary-chars > clean-file
This command deletes any character that is not a tab, line feed, carriage return, or in the range of printable ASCII characters (space through ~).
On Windows you can get the tr
command from the GNU Utilities for Win32 or Cygwin.
a piece of c# code - not very optimized for large count of control characters. a hint for starting:
StreamReader sr = new StreamReader(@"c:\temp.data\big_file_with_unwanted_chars.txt", Encoding.Default);
StreamWriter sw = new StreamWriter(@"c:\temp.data\big_file_without_any_evil_chars.txt", false, Encoding.Default);
string al;
while (!sr.EndOfStream)
{
al = sr.ReadLine();
al = al.Replace("ä", "");
al = al.Replace("#", "");
sw.WriteLine(al);
}
sw.Close();
sr.Close();
精彩评论