Regex Lookaheads
Need to capture content of root <pubDate> element, but in document it 开发者_如何转开发can be either within <item> element or within <channel> element. Also <item> is child of <channel> I'll bring example
<channel>
...
<pubDate>10/2/2010</pubDate>
...
<item>
...
<pubDate>13/2/2029</pubDate>
...
</item>
...
</channel>
need to capture 10/2/2010
With the <item> no problem, can capture it, along with its <pubDate>.
Regexp is not a good tool to deal with programming language that are parsed with context-free grammars. Try to use XML DOM to do the job.
I don't know JavaScript, so I can't help you with the DOM. I agree 100% that it's a bad idea to try and parse XML with regex. There might be a quick, very dirty, and very brittle workaround, though:
If indentation is consistent throughout the file, and <channel> elements are always at the same level of indentation, you could use that fact as a guide for the regex. In your example /^ {2}<pubDate>([^<]*)<\/pubdate>/m (= two spaces after start-of-line) might just work.
Use this at your own risk. Here be dragons etc.
Check out jQuery and see if this helps reading/parsing the XML: http://think2loud.com/reading-xml-with-jquery/
KM
加载中,请稍侯......
精彩评论