开发者

Stripping off urls' in a java string

I've tried this for a couple of hours and wasn't able to do this correctly; so I figured I'd post it here. Here's my problem.

Given a string in java :

"this is <a href='something'>one \nlink</a> some text <a href='fubar'>two \nlink</a> extra text"

Now i want to strip out the link tag from this string using regular expressions - so the resulting string should look like :

"this is one \nlink some text two \nlink extra text"

I've tried all kind of things in java regular expressions; capturing groups, greedy qualifiers - you name it, and still can't get it to work quite right. If there's only one link tag in the string, I can get it work easily. However my string can have multiple开发者_如何转开发 url's embedded in it which is what's preventing my expression to work. Here's what i have so far - (?s).*(<a.*>(.*)</a>).*

Note that the string inside the link can be of variable length, which is why i have the .* in the expression.

If somebody can give me a regular expression that'll work, I'll be extremely grateful. Short of looping through each character and removing the links i can't find a solution.


Sometimes it's easier to do it in 2 steps:

s = "this is <a href='something'>one \nlink</a> some text <a href='fubar'>two \nlink</a> extra text"
s.replaceAll("<a[^>]*>", "").replaceAll("</a>", "")
Result: "this is one \nlink some text two \nlink extra text"


Here's the way I usually match tags:

<a .*?>|</a>

and replace with an empty string.

Alternatively, instead of removing the tag, you might comment it out. The match pattern would be the same, but the replacement would be:

<!--\0-->

or

<!--$0-->

If you want to have a reference to the anchor text, use this match pattern:

<a .*?>(.*?)</a>

and the replacement would be an index of 1 instead of 0.

Note: Sometimes you have to use programming-language specific flags to allow regex to match across lines (multi-line pattern match). Here's a Java Example

Pattern aPattern = Pattern.compile(regexString,Pattern.MULTILINE);


Off the top of my head

"<a [^>]*>|</a>"
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜