Python utf-8, howto align printout
I have a array containing japanese caracters as well as "normal". How do I align the printout of these?
#!/usr/bin/python
# coding=utf-8
a1=['する', 'します', 'trazan', 'した', 'しました']
a2=['dipsy', 'laa-laa', 'banarne', 'po', 'tinky winky']
for i,j in zip(a1,a2):
开发者_运维百科 print i.ljust(12),':',j
print '-'*8
for i,j in zip(a1,a2):
print i,len(i)
print j,len(j)
Output:
する : dipsy
します : laa-laa
trazan : banarne
した : po
しました : tinky winky
--------
する 6
dipsy 5
します 9
laa-laa 7
trazan 6
banarne 7
した 6
po 2
しました 12
tinky winky 11
thanks, //Fredrik
Using the unicodedata.east_asian_width
function, keep track of which characters are narrow and wide when computing the length of the string.
#!/usr/bin/python
# coding=utf-8
import sys
import codecs
import unicodedata
out = codecs.getwriter('utf-8')(sys.stdout)
def width(string):
return sum(1+(unicodedata.east_asian_width(c) in "WF")
for c in string)
a1=[u'する', u'します', u'trazan', u'した', u'しました']
a2=[u'dipsy', u'laa-laa', u'banarne', u'po', u'tinky winky']
for i,j in zip(a1,a2):
out.write('%s %s: %s\n' % (i, ' '*(12-width(i)), j))
Outputs:
する : dipsy
します : laa-laa
trazan : banarne
した : po
しました : tinky winky
It doesn’t look right in some web browser fonts, but in a terminal window they line up properly.
Use unicode objects instead of byte strings:
#!/usr/bin/python
# coding=utf-8
a1=[u'する', u'します', u'trazan', u'した', u'しました']
a2=[u'dipsy', u'laa-laa', u'banarne', u'po', u'tinky winky']
for i,j in zip(a1,a2):
print i.ljust(12),':',j
print '-'*8
for i,j in zip(a1,a2):
print i,len(i)
print j,len(j)
Unicode objects deal with characters directly.
You need to manually build the string and also manually build the format length. There is no easy way for this
The three functions below do this (needs unicodedata):
shortenStringCJK: correctly shorten to a length for fitting in some output (not length cut for getting X characters)
def shortenStringCJK(string, width, placeholder='..'):
# get the length with double byte charactes
string_len_cjk = stringLenCJK(str(string))
# if double byte width is too big
if string_len_cjk > width:
# set current length and output string
cur_len = 0
out_string = ''
# loop through each character
for char in str(string):
# set the current length if we add the character
cur_len += 2 if unicodedata.east_asian_width(char) in "WF" else 1
# if the new length is smaller than the output length to shorten too add the char
if cur_len <= (width - len(placeholder)):
out_string += char
# return string with new width and placeholder
return "{}{}".format(out_string, placeholder)
else:
return str(string)
stringLenCJK: get correct length (as in space taken on a terminal)
def stringLenCJK(string):
# return string len including double count for double width characters
return sum(1 + (unicodedata.east_asian_width(c) in "WF") for c in string)
formatLen: format the length to adjust for width from double byte characters. without this one the length will be unbalanced.
def formatLen(string, length):
# returns length udpated for string with double byte characters
# get string length normal, get string length including double byte characters
# then subtract that from the original length
return length - (stringLenCJK(string) - len(string))
to then output some string: pre define the format string
format_str = "|{{:<{len}}}|"
format_len = 26
string_len = 26
and output as follows (where _string is the string to output)
print("Normal : {}".format(
format_str.format(
len=formatLen(shortenStringCJK(_string, width=string_len), format_len))
).format(
shortenStringCJK(_string, width=string_len)
)
)
精彩评论