Creating dictionary from a list of special characters

2023-03-19 23:18 问答作者：

I'm working on this small script: basically it's mapping the list elements (with special characters in it) to its index to create a dictionary.

#!/usr/bin/env python
#-*- coding: latin-1 -*-

ln1 = '?0>9<8~7|65"4:3}2{1+_)'
ln2 = "(*&^%$£@!/`'\][=-#¢"

refStr = ln2+ln1

keyDict = {}
for i in range(0,len(refStr)):
    keyDict[refStr[i]] = i


print "-" * 32
print "Originl: ",refStr
print "KeyDict: ", keyDict

# added just to test a few special characters
tsChr = ['£','%','\\','¢']

for k in tsChr:
    if k in keyDict:
        print k, "\t", keyDict[k]
    else: print k, "\t", "not in the dic."

It returns the result like this:

Originl:  (*&^%$£@!/`'\][=-#¢?0>9<8~7|65"4:3}2{1+_)
KeyDict:  {'!': 9, '\xa3': 7, '\xa2': 20, '%': 4, '$': 5, "'": 12, '&': 2, ')': 42, '(': 0, '+': 40, '*': 1, '-': 17, '/': 10, '1': 39, '0': 22, '3': 35, '2': 37, '5': 31, '4': 33, '7': 28, '6': 30, '9': 24, '8': 26, ':': 34, '=': 16, '<': 25, '?': 21, '>': 23, '@': 8, '\xc2': 19, '#': 18, '"': 32, '[': 15, ']': 14, '\\': 13, '_': 41, '^': 3, '`': 11, '{': 38, '}': 36, '|': 29, '~': 27}

开发者_如何学C

which is all good, except for the characters £, % and \ are converting to \xa3, \xa2 and \\ respectively. Does any one know why printing ln1/ln2 is just fine but the dictionary is not. How can I fix this? Any help greatly appreciated. Cheers!!

Update 1

I've added extra special characters - # and ¢ and then this is what I get following @Duncan's suggestion:

! 9
? 7
? 20
% 4
$ 5
....
....
8 26
: 34
= 16
< 25
? 21
> 23
@ 8
? 19
....
....

Notice that 7th, 19th and 20th elements, which is not printing correctly at all. 21st element is the actual ? character. Cheers!!

Update 2

Just added this loop to my original post to actually test my purpose:

tsChr = ['£','%','\\','¢']
for k in tsChr:
    if k in keyDict:
        print k, "\t", keyDict[k]
    else: print k, "\t", "not in the dic."

and this what I get as result:

£   not in the dic.
%   4
\   13
¢   not in the dic.

Whist running the script, it thinks that £ and ¢ are not actually in the dictionary - and that's my problem. Anyone knows how to fix that or what/where am I doing wrong?

eventually, I'll be checking for the character(s) from a file (or a line of text) in the dictionary to see if it exists and there is a chance of having character like é or £ and so on in the text. Cheers!!

When you print a dictionary or list that contains strings Python will display the repr() of the strings. If you print repr(ln2) you'll see that nothing has changed: your dictionary key is just the latin-1 encoding of '£' &c. characters.

If you do:

for k in keyDict:
    print k, keyDict[k]

then the characters will display as you expect.

In my humble opinion it would be useful to learn about unicode in general and it's use in python

if you are not interested to know why people had to mess up things so you have to deal with a '\xa3' instead of having just a plain £ then Duncan answer above is perfect and tells you everything you want to know.

Update (regardin your Update #2)

please assert your file is saved with latin-1 encoding and non utf-8 as it's now and your test will pass (or just change #-*- coding: latin-1 -*- to #-*- coding: utf-8 -*-)

This is a thing you could easily understand reading (and understanding) contents from my link above:

your file is saved as utf-8 this means for char £ 2 bytes are used but since you tell python interpreter encoding is latin-1 he will use each of the 2 utf-8 bytes of £ for a key.

Infact I can count 19 chars in ln2 but if you issue len(ln2) it will return 21.

When you test for '£' in keyDict.keys() you are looking for a 2-char string while each of the 2-chars got its own key in dictionary, that's why it won't find it.

Also you can test len(keyDict) and find it's longer than what you expect.

I guess this explains everything, please understand not all the story is easy to be explained in a single webpage but the link above, in my humble opinion is a nice starting point, mixing some story and some coding examples.

Cheers

P.S.: I'm using this code, saving it as UTF-8 and it works flawlessly:

#!/usr/bin/env python
#-*- coding: utf-8 -*-

ln1 = u'?0>9<8~7|65"4:3}2{1+_)'
ln2 = u"(*&^%$£@!/`'\][=-#¢"

refStr = u"%s%s" % (ln2, ln1)

keyDict = {}
for idx, chr_ in enumerate(refStr):
    print chr_,
    keyDict[chr_] = idx

print u"-" * 32
print u"Originl: ", refStr
print u"KeyDict: ", keyDict

tsChr = [u'£', u'%', u'\\', u'¢']
for k in tsChr:
    if k in keyDict.keys():
        print k, "\t", keyDict[k]
    else: print k, repr(k), "\t", "not in the dic."

继续阅读：python sorteddictionary

Creating dictionary from a list of special characters

Update (regardin your Update #2)

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

Update (regardin your Update #2)

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集 河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？