开发者

Get the locale from a string

if have a problem getting the locale out of a string like:

menu_title_en_US
menu_title_en

The locale in this string would be "en_US". The string that i have to deal with only have alphanumeric characters and underscores. Like variable names in Python.

I have tried the following regex so far:

re.compile(r'_(开发者_C百科?P<base_code>[a-z]{2,5})(_(?P<ext_code>[a-z]{2,5})){0,1}$')

which is working fine for strings like "menu_en" and "menu_en_US" but for stings like "menu_title_en" or "menu_title_en_US" it's not working as expected (extracting en or en_US).

Maybe someone has a quick idea how to solve this Problem.


If you know the locale is always en, en_us, or en_US (stated in a comment), then you don't need a regex at all:

locale = the_string[-6:]
if not locale.startswith('_en_'):
    locale = locale[3:]
locale = locale[1:]

or

locale = the_string[-3:]
for code in '_en', '_en_us', '_en_US':
    if code.endswith(locale):
        break
else:
    # no locale found

You could add more checks if the data could contain something that looked like a locale but wasn't -- these just check for the underscore plus two characters after.

However, the regex can be fixed / simplified a bit, too:

re.compile(r'_(?P<base_code>[a-z]{2})(_(?P<ext_code>[a-zA-z]{2}))?$')

? is the same as {0,1}, and since the codes are always two characters you want {2] not {2,5}. You want to accept either lower or upper case for the second code.

It still will have false positives, though.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜