python, list everything between two tags
I'm looking for the shortest neatest way to code the folloing.
say I have a string containing: 'the f<ox jumpe>d over the l<azy> dog <and the >fence'
Using < as the opening tag and > as the clos开发者_开发百科ing tag, I would like to save everything inbetween into a list.
if saved into list1, list1 would equal ['ox jumpe', 'azy', 'and the ']
Who knows of a nice, neat SHORT way to do this.
Thanks!
Regular expressions should do the trick here:
import re
text = 'the f<ox jumpe>d over the l<azy> dog <and the >fence'
list = re.findall('.*?\<(.*?)\>.*?', text)
print list
Edit:
You can read more about regex here
Mainly, what the regex from above does is:
.*? - non greedy match of all the characters until next wanted char
\< - matches the < char
(.*?) - non greedy match of all the characters until next wanted char, capture and returns them
Assuming every "<" and every ">" indicate the start or end of a tag e.g. you cant have <hi<there>
:
x="<a><bb><ccc>"
>>> starts=(i for i,c in enumerate(x) if c=="<")
>>> ends=(i for i,c in enumerate(x) if c==">")
>>> ans=[x[i+1:j] for i,j in zip(starts,ends)]
>>> ans
['a', 'bb', 'ccc']
use izip if it is a large xml file to save memory (Although x[i+1:j] would need to be changed as you wouldn't want the whole file as a string).
精彩评论