Munging non-printable characters to dots using string.translate()
So I've done this before and it's a surprising ugly bit of code for such a seemingly simple task.
The goal is to translate any non-printable character into a . (dot). For my purposes "printable" does exclude the last few characters from string.printable
(new-lines, tabs, and so on). This is for printing things like the old MS-DOS debug "hex dump" format ... or anything similar to that (where additional whitespace will mangle the intended dump layout).
I know I can use string.translate()
and, to use that, I need a translation table. So I use string.maketrans()
for that. Here's the best I could come up with:
filter = string.maketrans(
string.translate(string.maketrans('',''),
string.maketrans('',''),string.printable[:-5]),
'.'*len(string.translate(string.maketrans('',''),
string.maketrans('',''),string.printable[:-5])))
... which is an unreadable mess (though it does work).
From there you can call use something like:
for each_line in sometext:
print string.translate(each_line, filter)
... and be happy. (So long as you don't look under the hood).
Now it is more readable if I break that horrid expression into separate statements:
ascii = string.maketrans开发者_StackOverflow中文版('','') # The whole ASCII character set
nonprintable = string.translate(ascii, ascii, string.printable[:-5]) # Optional delchars argument
filter = string.maketrans(nonprintable, '.' * len(nonprintable))
And it's tempting to do that just for legibility.
However, I keep thinking there has to be a more elegant way to express this!
Here's another approach using a list comprehension:
filter = ''.join([['.', chr(x)][chr(x) in string.printable[:-5]] for x in xrange(256)])
Broadest use of "ascii" here, but you get the idea
>>> import string
>>> ascii="".join(map(chr,range(256)))
>>> filter="".join(('.',x)[x in string.printable[:-5]] for x in ascii)
>>> ascii.translate(filter)
'................................ !"#$%&\'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~.................................................................................................................................'
If I were golfing, probably use something like this:
filter='.'*32+"".join(map(chr,range(32,127)))+'.'*129
for actual code-golf, I imagine you'd avoid string.maketrans entirely
s=set(string.printable[:-5])
newstring = ''.join(x for x in oldstring if x in s else '.')
or
newstring=re.sub('[^'+string.printable[:-5]+']','',oldstring)
I don't find this solution ugly. It is certainly more efficient than any regex based solution. Here is a tiny bit shorter solution. But only works in python2.6:
nonprintable = string.maketrans('','').translate(None, string.printable[:-5])
filter = string.maketrans(nonprintable, '.' * len(nonprintable))
精彩评论