开发者

Replace & with & in C#

Ok I feel really stupid asking this. I see plenty of other questions that resemble my question, but none seem to be able to answer it.

I am creating an xml file for a program that is very picky about syntax. Sadly I am making开发者_如何学JAVA the XML file from scratch. Meaning, I am placing each line in individually (lots of file.WriteLine(String)).

I know this is ugly, but its the only way I can get the logic to work out.

ANYWAY. I have a few strings that are coming through with '&' in them.

if (value.Contains("&"))
   {
      value.Replace("&", "&");
   }

Does not seem to work. The value.Contains() seems to see it, but the replace does not work. I am using C# .Net 2.0 sp2. VS 2005.

Please help me out here.. Its been a long week..


If you really want to go that route, you have to assign the result of Replace (the method returns a new string because strings are immutable) back to the variable:

value = value.Replace("&", "&");

I would suggest rethinking the way you're writing your XML though. If you switch to using the XmlTextWriter, it will handle all of the encoding for you (not only the ampersand, but all of the other characters that need encoded as well):

using(var writer = new XmlTextWriter(@"C:\MyXmlFile.xml", null))
{
    writer.WriteStartElement("someString");
    writer.WriteText("This is < a > string & everything will get encoded");
    writer.WriteEndElement();
}

Should produce:

<someString>This is &lt; a &gt; string &amp; 
    everything will get encoded</someString>


You should really use something like Linq to XML (XDocument etc.) to solve it. I'm 100% sure you can do it without all your WriteLine´s ;) Show us your logic?

Otherwise you could use this which will be bullet proof (as opposed to .Replace("&")):

var value = "hej&hej<some>";
value = new System.Xml.Linq.XText(value).ToString(); //hej&amp;hej&lt;some&gt;

This will also take care of < which you also HAVE TO escape :)

Update: I have looked at the code for XText.ToString() and internally it creates a XmlWriter + StringWriter and uses XNode.WriteTo. This may be overkill for a given application so if many strings should be converted, XText.WriteTo would be better. An alternative which should be fast and reliant is System.Web.HttpUtility.HtmlEncode.

Update 2: I found this System.Security.SecurityElement.Escape(xml) which may be the fastest and ensures max compatibility (supported since .Net 1.0 and does not require the System.Web reference).


you can also use HttpUtility.HtmlEncode class under System.Web namespace instead of doing the replacement yourself. here you go: http://msdn.microsoft.com/en-us/library/73z22y6h.aspx


You can use Regex for replace char "&" only in node values:

input data example (string)

<select>
 <option id="11">Gigamaster&Minimaster</option>
 <option id="12">Black & White</option>
 <option id="13">Other</option>
</select>

Replace with Regex

 Regex rgx = new Regex(">(?<prefix>.*)&(?<sufix>.*)<");
 data = rgx.Replace(data, ">${prefix}&amp;${sufix}<");

 XmlDocument xmlDoc = new XmlDocument();
 xmlDoc.LoadXml(data);

result data

<select>
 <option id="11">Gigamaster&amp;MiniMaster</option>
 <option id="12">Black &amp; White</option>
 <option id="13">Other</option>
</select>


I'm Obviously very late to this, but the right answer is:

System.Text.RegularExpressions.Regex.Replace(input, "&(?!amp;)", "&amp;");

Hope this helps somebody!


You can try:

value = value.Replace("&", "&amp;");


Strings are immutable. You need to write:

value = value.Replace("&", "&amp;");

Note that if you do this and your string contains "&amp;", it's going to get changed to "&amp;amp;".


I've created the following function to encode & and ' without messing up with already encoded & or ' or "

    public static string encodeSelectXMLCharacters(string xmlString)
    {
        string returnValue = Regex.Replace(xmlString, "&(?!quot;|apos;|amp;|lt;|gt;#x?.*?;)|'",
            delegate(Match m)
            {
                string encodedValue;
                switch (m.Value)
                {
                    case "&":
                        encodedValue = "&amp;";
                        break;
                    case "'":
                        encodedValue = "&apos;";
                        break;
                    default:
                        encodedValue = m.Value;
                        break;
                }

                return encodedValue;
            });
        return returnValue;
    }


not sure if this is useful to anyone... I was fighting this for a while... here is a glorious regex you can use to fix all your links, javascript, content. I had to deal with a ton of legacy content that nobody wanted to correct.

Add this to your Render override in your master page, control or recode to run a string through it. Please don't flame me for putting this in the wrong place:

// remove the & from href="blaw?a=b&b=c" and replace with &amp; 
//in urls - this corrects any unencoded & not just those in URL's
// this match will also ignore any matches it finds within <script> blocks AND
// it will also ignore the matches where the link includes a javascript command like
// <a href="javascript:alert{'& & &'}">blaw</a>
html = Regex.Replace(html, "&(?!(?<=(?<outerquote>[\"'])javascript:(?>(?!\\k<outerquote>|[>]).)*)\\k<outerquote>?)(?!(?:[a-zA-Z][a-zA-Z0-9]*|#\\d+);)(?!(?>(?:(?!<script|\\/script>).)*)\\/script>)", "&amp;", RegexOptions.Singleline | RegexOptions.IgnoreCase);

Its a broad stroke for a rendered page but this can be adapted to many uses without blowing up your page.


What about

Value = Server.HtmlEncode(Value);


I am quite sure it will work if you "embrace" your value with CDATA, so the result is something like

<ampersandData><![CDATA[value with ampersands like &hellip;]]></ampersandData>

Hope it helps.
Michael


Very late here, but I want to share my solution which handles the cases where you have both & (incorrect xml) and & (valid xml) in the document in addition to other xml character entities.

This solution is only meant for cases where you cannot control generation of the xml, usually because it comes from some external source. If you control the xml generation please use XmlTextWriter as suggested by @Justin Niessner

It is also quite fast and handles all the different xml character entities/references

Predefined character entities:

& quot;

& amp;

& apos;

& lt;

& gt;

Numeric character entities/references:

& #nnnn;

& #xhhhh;

PS! The space after & should not be included in the entities/references, I just added it here to avoid it being encoded in the page rendering

Code

    public static string CleanXml(string text)
    {
        int length = text.Length;
        StringBuilder stringBuilder = new StringBuilder(length);

        for (int i = 0; i < length; ++i)
        {
            if (text[i] == '&')
            {
                var remaining = Math.Abs(length - i + 1);
                var subStrLength = Math.Min(remaining, 12);
                var subStr = text.Substring(i, subStrLength);
                var firstIndexOfSemiColon = subStr.IndexOf(';');
                if (firstIndexOfSemiColon > -1)
                    subStr = subStr.Substring(0, firstIndexOfSemiColon + 1);
                var matches = Regex.Matches(subStr, "&(?!quot;|apos;|amp;|lt;|gt;|#x?.*?;)|'");
                if (matches.Count > 0)
                    stringBuilder.Append("&amp;");
                else
                    stringBuilder.Append("&");
            }
            else if (XmlConvert.IsXmlChar(text[i]))
            {
                stringBuilder.Append(text[i]);
            }
            else if (i + 1 < length && XmlConvert.IsXmlSurrogatePair(text[i + 1], text[i]))
            {
                stringBuilder.Append(text[i]);
                stringBuilder.Append(text[i + 1]);
                ++i;
            }
        }

        return stringBuilder.ToString();
    }
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜