开发者

removing html tags using a for loop in Java [duplicate]

This question already has answers here: Closed 11 years ago.

Possible Duplicate:

Removing HTML from a Java String

I am having a problem removing htmls tags from a text file in java. I know it would 开发者_如何学运维be easy to use something like

str=str.toString().replaceAll("\\<.*?>","");

However I want to know if I could split the string and go throught and replace everything srarting from < to > with "".

I tried

String [] str= "<tag>with some string </tag>";
String  s="";
    for (i=0; i < str.length; i++)
    {
        if (str[i].toString()=="<")
        {
            str[i]="";
        }
        else if (str[i].toString()==">")
        {
            s=s+str[i+1];
        }
    }

when i try printing the new string s, it just prints out with just white space. thanks for the help


You need to some flag variable denoting you are inside of tag and add the third situation when you are not in the tag, so the rest of content gets added to string. For example:

String [] str= "<tag>with some string </tag>";
String  s="";
boolean inTag = false;
for (i=0; i < str.length; i++)
{
    if (str[i].toString()=="<")
    {
        inTag = true;
    }
    else if (str[i].toString()==">")
    {
        inTag = false;
    }else{
        if (!inTag)
            s = s + str[i];
    }
}


The code you supplied have a few errors. But anyway, you may do it with String#Split:

String[] strArr = str.split("\\<.*?>");

This will eliminate the tags.


In order to remove the html tags from a text file just look into this topic previously discussed in this forum

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜