Regex match for a non-english language in Python
I'm trying to capture and match russian language chara开发者_Python百科cters in a python script. Since russian characters don't fall in [a-Z] type, what regex should I should to match them. I can't use a (.*) because it would match everything.
linkpat = re.compile('name=[a-Z]+;size=[0-9]+')
Use unicode flag:
re.compile('name=\w+;size=\d+', re.U)
this would also match any letter in any language (plus underscore), not just Russian, though.
You can try \w
with the correct LOCALE
Use character classes, which are locale dependent
精彩评论