Remove the object tag from my html
I'm trying to remove the object tag from a text file:
<object classid=""clsid:F08DF954-8592-11D1-B16A-00C0F0283628"" id=""Slider1"" width=""100"" height=""50"">
<param name=""BorderStyle"" value=""1"" />
<param name=""MousePointer"" value=""0"" />
<param name=""Enabled"" value=""1"" />
<param name=""Min"" value=""0"" />
<param name=""Max"" value=""10"" />
&开发者_StackOverflow社区lt;/object>
My regex so far is:
hmtl = Regex.Replace(html, @"]>(?:.?)?", "", RegexOptions.IgnoreCase);
The inner param tags are not removed.
You should be able to specify the <object>
tag as a part of your expression, and match everything to until the </object>
tag.
Regex.Replace(html, @"<object.*?</object>", "", RegexOptions.Singleline);
If I understand what you're asking, this will do it:
$line =~ s/<object.*?>.*?<\/object>//is;
That's Perl, so the potential quirks:
- ? indicates a non-greedy match, i.e. that it should match the first possible termination of the pattern rather than the last
- /i is case insensitive
- /s says to treat the whole text as a single line (to be able to match across line breaks)
This RegEx might work for you (it is very hungry-greedy):
<object.+</object>
But I would advise to use HtmlAgilityPack instead.
It provides the ability to use HTML's DOM.
So you would work with it just like with XmlDocument:
HtmlDocument doc = new HtmlDocument();
doc.Load("file.htm");
foreach(HtmlNode obj in doc.DocumentElement.SelectNodes("object") {
obj.Parent.RemoveChild(obj);
}
doc.Save("file.htm");
精彩评论