Get the locale from a string
if have a problem getting the locale out of a string like:
menu_title_en_US
menu_title_en
The locale in this string would be "en_US". The string that i have to deal with only have alphanumeric characters and underscores. Like variable names in Python.
I have tried the following regex so far:
re.compile(r'_(开发者_C百科?P<base_code>[a-z]{2,5})(_(?P<ext_code>[a-z]{2,5})){0,1}$')
which is working fine for strings like "menu_en" and "menu_en_US" but for stings like "menu_title_en" or "menu_title_en_US" it's not working as expected (extracting en or en_US).
Maybe someone has a quick idea how to solve this Problem.
If you know the locale is always en
, en_us
, or en_US
(stated in a comment), then you don't need a regex at all:
locale = the_string[-6:]
if not locale.startswith('_en_'):
locale = locale[3:]
locale = locale[1:]
or
locale = the_string[-3:]
for code in '_en', '_en_us', '_en_US':
if code.endswith(locale):
break
else:
# no locale found
You could add more checks if the data could contain something that looked like a locale but wasn't -- these just check for the underscore plus two characters after.
However, the regex can be fixed / simplified a bit, too:
re.compile(r'_(?P<base_code>[a-z]{2})(_(?P<ext_code>[a-zA-z]{2}))?$')
?
is the same as {0,1}
, and since the codes are always two characters you want {2]
not {2,5}
. You want to accept either lower or upper case for the second code.
It still will have false positives, though.
精彩评论