Regular expression for removing HTML tags from a string

2023-02-12 02:45 问答作者：

I am looking for a regular expression to removing all HTML tags from a string in JSP.

Example 1

sampleString = "test string <i>in italics</i> continues";

Example 2

sampleString = "test string <i>in italics";

Example 3

sampleString = "test string <i"开发者_如何学JAVA;

The HTML tag might be complete, partial (without closing tag) or without proper starting tag (missing closing angle bracket in 3rd example) itself.

Thanks in advance

Case 3 is not possible with regex or a parser. It might represent legitimate content. So forget it.

As to the concrete question which covers cases 1 and 2, just use a HTML parser. My favourite is Jsoup.

String text = Jsoup.parse(html).text();

That's it. It has by the way also a HTML cleaner, if that is what you're actually after.

Since you're using JSP, you could also just use JSTL <c:out> or fn:escapeXml() to avoid that user-controlled HTML input get inlined among your HTML (which may thus open XSS holes).

<c:out value="${bean.property}" />
<input type="text" name="foo" value="${fn:escapeXml(param.foo)}" />

HTML tags will then not be interpreted, but just displayed as plain text.

<\/?font(\s\w+(\=\".*\")?)*\>

I used this little gem about a week ago to strip a variety of 12-year-old html tags, and it worked pretty great. Just replace 'font' with whatever tag you're looking for, or with \w* to get rid of all of them.

Edit removed '?' from the end of my string after realizing that could remove non-tag data from a file. Basically, this will consistently find case 1 and 2, but if used with case 3 (with the '?' appended to the end of the regex), caution should be used to ensure what is removed is a tag.

继续阅读：regex

Regular expression for removing HTML tags from a string

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？