removing html tags using a for loop in Java [duplicate]
Possible Duplicate:
Removing HTML from a Java String
I am having a problem removing htmls tags from a text file in java. I know it would 开发者_如何学运维be easy to use something like
str=str.toString().replaceAll("\\<.*?>","");
However I want to know if I could split the string and go throught and replace everything srarting from < to > with "".
I tried
String [] str= "<tag>with some string </tag>";
String s="";
for (i=0; i < str.length; i++)
{
if (str[i].toString()=="<")
{
str[i]="";
}
else if (str[i].toString()==">")
{
s=s+str[i+1];
}
}
when i try printing the new string s, it just prints out with just white space. thanks for the help
You need to some flag variable denoting you are inside of tag and add the third situation when you are not in the tag, so the rest of content gets added to string. For example:
String [] str= "<tag>with some string </tag>";
String s="";
boolean inTag = false;
for (i=0; i < str.length; i++)
{
if (str[i].toString()=="<")
{
inTag = true;
}
else if (str[i].toString()==">")
{
inTag = false;
}else{
if (!inTag)
s = s + str[i];
}
}
The code you supplied have a few errors. But anyway, you may do it with String#Split
:
String[] strArr = str.split("\\<.*?>");
This will eliminate the tags.
In order to remove the html tags from a text file just look into this topic previously discussed in this forum
精彩评论