What is the correct encoding for querystrings?
I am trying to send a request to an url like this "http://mysite.dk/tværs?test=æ" from an asp.net application, and I am having trouble getting the querystring to encode correctly. Or maybe the querystring is encoded correctly, the service I 开发者_StackOverflowam connecting to just doesn't understand it correctly.
I have tried to send the request with different browsers and logging how they encode the request with Wireshark, and I get these results:
Firefox: http://mysite.dk/tv%C3%A6rs?test=%E6 Ie8: http://mysite.dk/tv%C3%A6rs?test=\xe6 Curl: http://mysite.dk/tv\xe6rs?test=\xe6
Both Firefox, IE and Curl receive the correct results from the service. Note that they encode the danish special character 'æ' differently in the querystring.
When I send the request from my asp.net application using HttpWebRequest, the URL gets encoded this way:
http://mysite.dk/tv%C3%A6rs?test=%C3%A6
It encodes the querystring the same way as the path part of the url. The remote service does not understand this encoding, so I don't get a correct answer.
For the record, 'æ' (U+00E6) is %E6 in ISO-LATIN-1, and %C3%A6 in UTF-8.
I could change the remote service to accept the UTF-8 encoded querystring, but then the service would stop working in browsers and I am not really interested in that. Is there a way to specify to .NET that it shouldn't encode querystrings with UTF-8?
I am creating the webrequest like this:
var req = WebRequest.Create("http://mysite.dk/tværs?test=æ") as HttpWebRequest;
But the problem seems to originate from System.Uri which is apparently used inside WebRequest.Create:
var uri = new Uri("http://mysite.dk/tværs?test=æ");
// now uri.AbsolutePath == "http://mysite.dk/tv%C3%A6rs?test=%C3%A6"
It looks like you're applying UrlEncode over the entire URL - this isn't correct, paths and query strings are encoded differently as you've seen. What is doing the encoding of the URI, WebRequest?
You could manually build the various parts using a UriBuilder, or manually encode using UrlPathEncode for the path and UrlEncode for the query string names and values.
Edit:
If the problem lies in the path, rather than the query string you could try turning on IRI support, via web.config
<configuration>
<uri>
<iriParsing enabled="true" />
</uri>
</configuration>
That should then leave international characters alone in the path.
Have you tried the UrlEncode?
http://msdn.microsoft.com/en-us/library/zttxte6w.aspx
I ended up changing my remote webservice to expect the querystring to be UTF-8 encoded. It solves my immediate problem, the webservice can not be correctly called by both PHP and the .NET framework.
However, the behavior is now strange in browsers. Copy pasting an url like "http://mysite.dk/tv%C3%A6rs?test=%C3%A6" into the browser and then pressing return works, it even corrects the encoded characters and displays the location as "http://mysite.dk/tværs?test=æ". If then reload the page (F5) it still works. But if I click on the location bar and press return again, the querystring will become encoded with latin-1 and fail.
For anyone interested here is an old Firefox bugreport about the problem: https://bugzilla.mozilla.org/show_bug.cgi?id=284474 (thanks to @dtb)
So, it seems there is no good solution.
Thanks to everyone who helped though!
精彩评论