How can I do this regex on C#?
I have a string, that can have some html tags. I'd like to remove some of them (with the data on it), but not all tags.
In fact I'd like to remove <img />
and <div>...</div>
.
So for example, if I have the string hello <div>bye bye</div> marco
Id like to get hello marco
.
How can I开发者_开发问答 do this on C#?
I think you are aware about people's general opinion about parsing HTML with regex. I would recommend you using a HTML parser such as HTML Agility Pack.
Here's a sample:
class Program
{
static void Main()
{
var doc = new HtmlDocument();
doc.LoadHtml("hello <div>bye bye</div> marco <img src=\"http://example.com\"/> test");
for (int i = 0; i < doc.DocumentNode.ChildNodes.Count; i++)
{
var child = doc.DocumentNode.ChildNodes[i];
if (child.NodeType == HtmlNodeType.Element && new[] { "div", "img" }.Contains(child.Name, StringComparer.OrdinalIgnoreCase))
{
doc.DocumentNode.RemoveChild(child);
}
}
var sb = new StringBuilder();
using (var writer = new StringWriter(sb))
{
doc.Save(writer);
}
Console.WriteLine(sb); // prints "hello marco test"
}
}
It is not a good idea to use regex for XML. Depending on the language you should use some XML library.
In this case the regex is pretty simple, though:
string s = "hello <div>bye bye</div> marco <img />";
Regex rgx = new Regex("(<div>[^<]*</div>)|(<img */>)");
s = rgx.Replace(s, "");
精彩评论