Java Regex to get the text from HTML anchor (<a>...</a>) tags
I'm trying to get a text within a certain tag. So if I have:
<a href="http://something.com">Found<a/>
I want to be able to retrieve the Found
text.
I'm trying to do it using regex. I am able to do it if the <a hre开发者_C百科f="http://something.com>
stays the same but it doesn't.
So far I have this:
Pattern titleFinder = Pattern.compile( ".*[a-zA-Z0-9 ]* ([a-zA-Z0-9 ]*)</a>.*" );
I think the last two parts - the ([a-zA-Z0-9 ]*)</a>.*
- are ok but I don't know what to do for the first part.
As they said, don't use regex to parse HTML. If you are aware of the shortcomings, you might get away with it, though. Try
Pattern titleFinder = Pattern.compile("<a[^>]*>(.*?)</a>", Pattern.DOTALL | Pattern.CASE_INSENSITIVE);
Matcher regexMatcher = titleFinder.matcher(subjectString);
while (regexMatcher.find()) {
// matched text: regexMatcher.group(1)
}
will iterate over all matches in a string.
It won't handle nested <a>
tags and ignores all the attributes inside the tag.
str.replaceAll("</?a>", "");
Here is online ideone demo
Here is similar topic : How to remove the tags only from a text ?
精彩评论