Get rid content between <!-- and // -->
I have a text file which is the result of parsed HTML into plain text file. I need to get rid of which is something like XHTML comment like the following shows:
<!--
if (!document.phpAds_used)
document.phpAds_used = ',';
phpAds_random = new String
(Math.random()); phpAds_random =
phpAds_random.substring(2,11);
documen开发者_StackOverflowt.write ("<" + "script
language='JavaScript'
type='text/javascript' src='");
document.write
("http://www.writers.net/Openads/adjs.php?n="
+ phpAds_random); document.write ("&what=zone:5&target=_blank");
document.write ("&exclude=" +
document.phpAds_used); if
(document.referrer) document.write
("&referer=" +
escape(document.referrer));
document.write ("'><" + "/script>");
// -->
How can I get rid of anything between <!--
and //-->
using Java?
A simple solution would be to use the String.replaceAll() method.
For example, something like the following code should work:
String x = "wow <!-- // --> zip, here's <!-- comment here //--> another one";
x = x.replaceAll("<!--.*?//\\s*-->", "");
System.out.println(x); // prints out "wow zip, here's another one"
The \\s*
matches none or many spaces since your example had a space but your description did not. The .*?
makes this a non-greedy match so it will match up to the first //-->
If you are running this over and over, you could use the Pattern
instead and just regenerate the matcher for each block you are processing:
Pattern.compile("<!--.*?//\\s*-->").matcher(x).replaceAll("")
精彩评论