Python - to check if a char is in dictionary and if not to deal with it

2022-12-19 21:58 问答作者：

I am going about transliteration from one source language(input file) to a target language(target file) so I am checking for equivalent mappings in a dictionary in my source code, certain characters in the source code don't have an equivalent mapping like comma(,) and all other such special symbols. How do I check if the character belongs to the dictionary for whic开发者_运维问答h I have an equivalent mapping and to even take care of those special symbols to be printed in the target file which don't have an equivalent mapping in the dictionary. Thank you:).

My recommendation, given that rules is a mapping of the characters to their transliterated equivalents:

results = []
for char in source_text:
    results.append(rules.get(char, char))
return ''.join(results)    # turns the list back into a string

A dict's get method will return either the value for a key or a default value if the key does not exist - normally the default value is None, but in this case, we gave the same character as the default value (the second argument) so that if the key is not found it will just return itself.

A more compact way to write this using generator expressions would be:

''.join((rules.get(char, char) for char in source_text))

If you use the translate method of Unicode objects, as I recommended in answer to another question of yours, everything's done automatically for you exactly as you desire: each Unicode character c whose codepoints (ord(c)) is not in the transliteration dictionary is simply passed unchanged from input to output, just as you want. Why reinvent the wheel?

I think you want something like this:

tokenMapping = {"&&" : "and"}

for token in source file: # <-- pseudocode
    translatedToken = tokenMapping[token] if token in tokenMapping else "transliteration unknown"

If there's a translation in the dictionary (e.g. "&&" -> "and"), it will use that. Else it will translate to "transliteration unknown".

Hope that helped.

EDIT: As LeafStorm suggested, a dictionary's get function can be used to simplify the above code. The code line in the loop would become

    translatedToken = tokenMapping.get(token, "transliteration unknown")

dictx = {}
for itm in my_source :
    dictx[itm] = dictx.get(itm, 0) + 1

I didn't completely understand the details of your question, but here's the simplest example i could think of that illustrates the pattern i think you are after.

The 'get' method i believe is what you want. It allows you to retrieve a key from a dictionary, but if the key is not there, you can set a default value--i.e., "i want dictx[itm] (the value assigned to the key 'itm') but if 'itm' is not in dictionary then create it and value of .'

This snippet will loop through your source document ('my_source') and count the frequency of the various items in it, adding those counts as values to the keys already in your dictionary, but when it reaches an item for which no key exists, no exception is thrown, a key is added and a value of '0' assigned.

This seems pretty straightforward. If your dictionary is char to char, then you would do something like

outstr = ''
for ch in instr:
    if ch in mydict:
        outstr += mydict[ch]
    else:
        outstr += ch

Here, instr is your input string and mydict contains your mapping of chars to chars.

If you want to check parts of words, I would recommend using two dictionaries: one that contains the characters that are contained in any word, and one that contains the words. You could use it like this:

outstr = ''
word = ''
for ch in instr:
    if ch in chardict:
        word += ch
    else:
        if len(word):
            if word in worddict:
                outstr += worddict[word]
            else:
                outstr += word
            word = ''
        outstr += ch
if len(word):
    outstr += worddict[word]
else:
    outstr += word

chardict might contain all of the alphabet for instance. Of course, you might want to do some parts a little bit differently (like use something other than chardict to check if a char is to be considered part of a valid word - perhaps something with a binary search), but hopefully you get the idea.

继续阅读：python transliteration

Python - to check if a char is in dictionary and if not to deal with it

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？