Regular Expressions to get text between tags
I am writing an application to get the title of an html page, some text under the body tag and an image. It is something like the share stuff of facebook. I can get a regular expression that does开发者_开发知识库 that. Thanks for your assitance.
A regexp like <title>(.*?)</title>
will get you the content of title.
The .*? part is for matching any characters, in a non greedy way (in case there is another title end tag in the page).
You should probably use a HTML Parser instead of Regular Expression. See Simple HTML DOM, for example.
A regular expression for your task will be very hard to maintain and will break easily on any changes of the pages in question, not to mention that you cannot account for HTML comments.
I just coined this expression which gets the text inside tags (the node value), without the actual tags themselves.
(?<=\"\>)(.*?)(?=\<\/)
You can see it in action with PHP here: http://codepad.viper-7.com/AUTcv3
精彩评论