开发者

get text between html tags

Possible duplicate: RegEx matching HTML tags and extracting text

I need to get the text between the html tag like <p></p> or whatever. My pattern is this

Pattern pText = Pattern.compile(">([^>|^<]*?)开发者_如何学Go<");

Anyone knows some better pattern, because this one its not very usefull. I need it to get for index the content from web page.

Thanks


SO is about to descend on you. But let me be the first to say, don't use regular expressions to parse HTML. Here is a list of Java HTML Parsers. Look around until you see an API that suits your fancy and use that instead.


It looks like you are trying to use the | operator inside a negative set, which is neither working nor needed. Just specify the characters that you don't want to match:

Pattern pText = Pattern.compile(">([^<>]*?)<");


Don't use regular expressions when parsing HTML.

Use XPath instead (if your HTML is well formed). You can reference text nodes using the text() function very easily.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜