开发者

regular expression question (python)

I want to read a word html file and grab any words which contain letters of a name but not print them if the words are longer than the name

# compiling the reg开发者_开发问答ular expression:
keyword = re.compile(r"^[(rR)|(yY)|(aA)|(nN)]{5}$/")

if keyword.search (line):
    print line,

i am grabbing the words with this but don't seem to be limiting the size properly.


it seems you are looking for keyword.match() instead of keyword.search(). you should read this part of the python documentation which discusses the difference between match and search.

also, your regular expression seems completely off... [ and ] delimits a set of characters, so you can't put groups and have a logic around the groups. as written, your expression will also match all (, ) and |. you may try the following:

keyword = re.compile(r"^[rRyYaAnN]{5}$")


Your RE "^[(rR)|(yY)|(aA)|(nN)]{5}$/" will never never never give a matching in any string on earth and elsewhere, I think, because of the '/' character after '$'

See the results of the RE without this '/':

import re

pat = re.compile("^[(rR)|(yY)|(aA)|(nN)]{5}$")

for ch in ('arrrN','Aar)N','()|Ny','NNNNN',
           'marrrN','12Aar)NUUU','NNNNN!'):
    print ch.ljust(15),pat.search(ch)

result

arrrN           <_sre.SRE_Match object at 0x011C8EC8>
Aar)N           <_sre.SRE_Match object at 0x011C8EC8>
()|Ny           <_sre.SRE_Match object at 0x011C8EC8>
NNNNN           <_sre.SRE_Match object at 0x011C8EC8>
marrrN          None
12Aar)NUUU      None
NNNNN!          None

My advice: think of [.....] in a RE as representing ONE character at ONE position. So every character that is between the brackets is one of the options of represented character.

Moreover, as said by Adrien Plisson, between brackets [......] a lot of special characters lost their speciality. Hence '(', ')','|' don't define group and OR, they represent just these characters as some of the options along with the letters 'aArRyYnN'

.

"^[rRyYaAnN]{1,5}$" will match only strings as 'r',ar','YNa','YYnA','Nanny'

If you want to match the same words anywhere in a text, you will need "[rRyYaAnN]{1,5}"

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜