开发者

regex to extract part of filename

I want to extract part of a filename that is containe开发者_StackOverflow社区d in a xml string

Sample

<assets>
<media width="100%" height="100%" img="/assets/560PEgnR/kVvNKfOX7w9tf7.JPG"  valign="top"/>
<media width="100%" height="100%" img="/assets/560PEgnR/kVvNKfOX7w9tf5.JPG"  valign="top"/>
<media width="100%" height="100%" img="/assets/560PEgnR/kVvNKfOX7w9tf4.JPG"  valign="top"/>
</assets>

I want to match and retrieve the 560PEgnR portion from all entries, regardless of the filename

So far I have

/assets/(.*)/*"

But it doesn't do what I want

Any help appreciated

Thanks


Alternatively...

/assets/([^/])+/


You should try with:

/assets/(.*?)/.*

.* is gready, but using ? it stops on the first /.


There are several alternatives. Your mistake is that your .* part also includes the '/', so either you make it less greedy (as hsz proposed above) or you exclude a '/' from the matching group like this /assets/([^/]*).*.


A non-regex approach

>>> string="""
... <assets>  
... <media width="100%" height="100%" img="/assets/560PEgnR/kVvNKfOX7w9tf7.JPG"  valign="top"/>
... <media width="100%" height="100%" img="/assets/560PEgnR/kVvNKfOX7w9tf5.JPG"  valign="top"/>
... <media width="100%" height="100%" img="/assets/560PEgnR/kVvNKfOX7w9tf4.JPG"  valign="top"/>
... </assets>                                                                                  
... """           

>>> for line in string.split("\n"):
...     if "/assets/" in line:
...         print line.split("/assets/")[-1].split("/")[0]
...
560PEgnR
560PEgnR
560PEgnR


Properly parsing the XML and avoiding the unnecessary use of regex:

from lxml import etree

xml = """
<assets>
<media width="100%" height="100%" img="/assets/560PEgnR/kVvNKfOX7w9tf7.JPG"  valign="top"/>
<media width="100%" height="100%" img="/assets/560PEgnR/kVvNKfOX7w9tf5.JPG"  valign="top"/>
<media width="100%" height="100%" img="/assets/560PEgnR/kVvNKfOX7w9tf4.JPG"  valign="top"/>
</assets>
"""

xmltree = etree.fromstring(xml)

for media in xmltree.iterfind(".//media"):
    path = media.get('img')
    print path.split('/')[-2]

Gives:

560PEgnR
560PEgnR
560PEgnR
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜