How to strip out one common attribute from every form element on the page?
I have a string variab开发者_开发百科le that contains an HTML page's response. It contains hundreds of tags, including the the following three html tags:
<tag1 prefix1314030136543="2">
<tag2 prefix131403013654="1" anotherAttribute="432">
<tag3 prefix13140301376543="4">
I need to be able to strip out any attribute that starts with "prefix" along with its value, regardless of tag name. In the end, I'd like to have:
<tag1>
<tag2 anotherAttribute="432">
<tag3>
I am using C#. I'm assuming RegEx is the solution, but I'm horrible with RegEx and hope someone can help me out here.
Look at Html Agility Pack.
Using regex:
(?<=<[^<>]*)\sprefix\w+="[^"]"\s?(?=[^<>]*>)
var result = Regex.Replace(s,
@"(?<=<[^<>]*)\sprefix\w+=""[^""]""(?=[^<>]*>)", string.Empty);
RegEx is not the solution since HTML is not a regular language and as such shouldn't be parsed with RegEx's. I've heard good things about HTML Agility Pack for parsing and working with HTML. Check it out.
var doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(/* your html here */);
foreach (var item in doc.DocumentNode.Descendants()) {
foreach (var attr in item.Attributes.Where(x =>x.Name.StartsWith("prefix")).ToArray()) {
item.Attributes.Remove(attr);
}
}
html = Regex.Replace(html, @"(?<=<\w+\s[^>]*)\s" + Regex.Escape(prefix) + @"\w+\s?=\s?""[^""]*""(?=[^>]*>)", "");
You have a look behind and look ahead that will find , then you have a matcher for the prefix#####="?????".
Here's the heavy handed method of doing it.
String str = "<tag1 prefix131403013654=\"2\">";
while (str.IndexOf("prefix131403013654=\"") != -1) //At least one still exists...
{
int point = str.IndexOf("prefix131403013654=\"");
int length = "prefix131403013654=\"".Length;
//need to grab last part now. We know there's a leading double quote and a ending double quote surrounding it, so we find the second quote.
int secondQuote = str.IndexOf("\"",point + length); //second part is your position
if (str.Substring(point - 1, 1) == " ")
{
str = str.Replace(str.Substring(point, (secondQuote - point + 1)),"");
}
}
edited for better code. Edited again after testing, added +1 to replace to count the final quote. It works. Basically you could encompass this in a loop that goes through an array list that has all "remove these" values in it.
If you don't know the full prefix's name you can change it up like so:
String str = "<tag1 prefix131403013654=\"2\">";
while (str.IndexOf("prefix") != -1) //At least one still exists...
{
int point = str.IndexOf("prefix");
int firstQuote = str.IndexOf("\"", point);
int length = firstQuote - point + 1;
//need to grab last part now. We know there's a leading double quote and a ending double quote surrounding it, so we find the second quote.
int secondQuote = str.IndexOf("\"",point + length); //second part is your position
if (str.Substring(point - 1, 1) == " ") //checking if its actually a prefix
{
str = str.Replace(str.Substring(point, (secondQuote - point + 1)),"");
}
//Like I said, a very heavy way of doing it.
}
That will catch all of them that start with prefix.
精彩评论