开发者

python, list everything between two tags

I'm looking for the shortest neatest way to code the folloing.

say I have a string containing: 'the f<ox jumpe>d over the l<azy> dog <and the >fence'

Using < as the opening tag and > as the clos开发者_开发百科ing tag, I would like to save everything inbetween into a list.

if saved into list1, list1 would equal ['ox jumpe', 'azy', 'and the ']

Who knows of a nice, neat SHORT way to do this.

Thanks!


Regular expressions should do the trick here:

import re

text = 'the f<ox jumpe>d over the l<azy> dog <and the >fence'
list = re.findall('.*?\<(.*?)\>.*?', text)

print list

Edit:

You can read more about regex here

Mainly, what the regex from above does is:

.*? - non greedy match of all the characters until next wanted char

\< - matches the < char

(.*?) - non greedy match of all the characters until next wanted char, capture and returns them


Assuming every "<" and every ">" indicate the start or end of a tag e.g. you cant have <hi<there>:

x="<a><bb><ccc>"
>>> starts=(i for i,c in enumerate(x) if c=="<")
>>> ends=(i for i,c in enumerate(x) if c==">")
>>> ans=[x[i+1:j] for i,j in zip(starts,ends)]
>>> ans
['a', 'bb', 'ccc']

use izip if it is a large xml file to save memory (Although x[i+1:j] would need to be changed as you wouldn't want the whole file as a string).

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜