how can i remove an outer <p>...</p> from a string
I want to query a string (html) from a database and d开发者_C百科isplay it on a webpage. The problem is that the data has a
<p> around the text (ending with </p>
I want to strip this outer tag in my viewmodel or controlleraction that returns this data. what is the best way of doing this in C#?
Might be overkill for your needs, but if you want to parse the HTML you can use the HtmlAgilityPack - certainly a cleaner solution in general than most suggested here, although it might not be as performant:
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml("<p> around the text (ending with </p>");
string result = doc.DocumentNode.FirstChild.InnerHtml;
If you're absolutely sure the string will always have that tag, you can use String.Substring like myString.Substring(3, myString.Length-7)
or so.
A more robust method would be to either manually code the appropriate tests or use a regular expression, or ultimately, use an HTML parser as suggested by BrokenGlass's answer.
UPDATE: Using regexes you could do:
String filteredString = Regex.Match(myString, "^<p>(.*)</p>").ToString();
You could add \s after the initial ^ to remove also leading whitespace. Also, you can check the result of Match to see if the string matched the <p>...</p>
pattern at all. This may also help.
If the data is always surrounded by <p>
... </p>
:
string withoutParas = withParas.Substring(3, withParas.Length - 7);
Try using string function Remove() passing it the FirstIndex() of <p>
and the last index of </p>
with length 3
If you are absolutely guaranteed that you string will always fit the pattern of <p>...</p>
, then the other solutions using data.Substring(3, data.Length - 6)
are sufficient. If, however, there's any chance that it could look at all different, then you really need to use an HTML parser. The consensus is that the HTML Agility Pack is the way to go.
s = s.Replace("<p>", String.Empty).Replace("</p>", String.Empty);
精彩评论