开发者

What's the difference between Request.Url.Query and Request.QueryString?

I have been tracking down a bug on a Url Rewriting application. The bug showed up as an encoding problem on some diacritic characters in the querystring.

Basically, the problem was that a request which was basically /search.aspx?search=heřmánek was getting rewritten with a querystring of "search=he%c5%99m%c3%a1nek"

The correct value (using some different, working code) was a rewrite of the querystring as "search=he%u0159m%u00e1nek"

Note the difference between the two strings. However, if you post both you'll see that the Url Encoding reproduces the same string. It's not until you use the context.Rewrite function that the encoding breaks. The broken string returns 'heÅmánek' (using Request.QueryString["Search"] and the working string returns 'heřmánek'. This change happens after the call to the rewrite function.

I traced 开发者_开发问答this down to one set of code using Request.QueryString (working) and the other using Request.Url.Query (request.Url returns a Uri instance).

While I have worked out the bug there is a hole in my understanding here, so if anyone knows the difference, I'm ready for the lesson.


Your question really sparked my interest, so I've done some reading for the past hour or so. I'm not absolutely positive I've found the answer, but I'll throw it out there to see what you think.

From what I've read so far, Request.QueryString is actually "a parsed version of the QUERY_STRING variable in the ServerVariables collection" [reference] , where as Request.Url is (as you stated) the raw URL encapsulated in the Uri object. According to this article, the Uri class' constructor "...parses the [url string], puts it in canonical format, and makes any required escape encodings."

Therefore, it appears that Request.QueryString uses a different function to parse the "QUERY_STRING" variable from the ServerVariables constructor. This would explain why you see the difference between the two. Now, why different encoding methods are used by the custom parsing function and the Uri object's parsing function is entirely beyond me. Maybe somebody a bit more versed on the aspnet_isapi DLL could provide some answers with that question.

Anyway, hopefully my post makes sense. On a side note, I'd like to add another reference which also provided for some very thorough and interesting reading: http://download.microsoft.com/download/6/c/a/6ca715c5-2095-4eec-a56f-a5ee904a1387/Ch-12_HTTP_Request_Context.pdf


What you indicated as the "broken" encoded string is actually the correct encoding according to standards. The one that you indicated as "correct" encoding is using a non-standard extension to the specifications to allow a format of %uXXXX (I believe it's supposed to indicate UTF-16 encoding).

In any case, the "broken" encoded string is ok. You can use the following code to test that:

Uri uri = new Uri("http://www.example.com/test.aspx?search=heřmánek");
Console.WriteLine(uri.Query);
Console.WriteLine(HttpUtility.UrlDecode(uri.Query));

Works fine. However... on a hunch, I tried UrlDecode with a Latin-1 codepage specified, instead of the default UTF-8:

Console.WriteLine(HttpUtility.UrlDecode(uri.Query, 
           Encoding.GetEncoding("iso-8859-1")));

... and I got the bad value you specified, 'heÅmánek'. In other words, it looks like the call to HttpContext.RewritePath() somehow changes the urlencoding/decoding to use the Latin-1 codepage, rather than UTF-8, which is the default encoding used by the UrlEncode/Decode methods.

This looks like a bug if you ask me. You can look at the RewritePath() code in reflector and see that it is definitely playing with the querystring - passing it around to all kinds of virtual path functions, and out to some unmanaged IIS code.

I wonder if somewhere along the way, the Uri at the core of the Request object gets manipulated with the wrong codepage? That would explain why Request.Querystring (which is simply the raw values from the HTTP headers) would be correct, while the Uri using the wrong encoding for the diacriticals would be incorrect.


I have done a bit of research over the past day or so and I think I have some information on this.

When you use Request.Querystring or HttpUtility.UrlDecode (or Encode) it is using the Encoding that is specified in the element (specifically the requestEncoding attribute) of the web.config (or the .config hierarchy if you haven't specified) ---NOT the Encoding.Default which is the default encoding for your server.

When you have the encoding set to UTF-8, a single unicode character can be encoded as 2 %xx hex values. It will also be decoded that way when given the whole value.

If you are UrlDecoding with a different Encoding than the url was encoded with, you will get a different result.

Since HttpUtility.UrlEncode and UrlDecode can take an encoding parameter, its tempting to try to encode using an ANSI codepage, but UTF-8 is the right way to go if you have the browser support (apparently old versions don't support UTF-8). You just need to make sure that the is properly set and both sides will work fine.

UTF-8 Seems to be the default encoding: (from .net reflector System.Web.HttpRequest)

internal Encoding QueryStringEncoding
{
    get
    {
        Encoding contentEncoding = this.ContentEncoding;
        if (!contentEncoding.Equals(Encoding.Unicode))
        {
            return contentEncoding;
        }
        return Encoding.UTF8;
    }
}

Following the path to find out the this.ContentEncoding leads you to (also in HttpRequest)

public Encoding ContentEncoding
{
    get
    {
        if (!this._flags[0x20] || (this._encoding == null))
        {
            this._encoding = this.GetEncodingFromHeaders();
            if (this._encoding == null)
            {
                GlobalizationSection globalization = RuntimeConfig.GetLKGConfig(this._context).Globalization;
                this._encoding = globalization.RequestEncoding;
            }
            this._flags.Set(0x20);
        }
        return this._encoding;
    }
    set
    {
        this._encoding = value;
        this._flags.Set(0x20);
    }
}

To answer your specific question on the difference betwen Request.Url.Quer and Request.QueryString... here is how HttpRequest builds its Url Property:

public Uri Url
{
    get
    {
        if ((this._url == null) && (this._wr != null))
        {
            string queryStringText = this.QueryStringText;
            if (!string.IsNullOrEmpty(queryStringText))
            {
                queryStringText = "?" + HttpEncoder.CollapsePercentUFromStringInternal(queryStringText, this.QueryStringEncoding);
            }
            if (AppSettings.UseHostHeaderForRequestUrl)
            {
                string knownRequestHeader = this._wr.GetKnownRequestHeader(0x1c);
                try
                {
                    if (!string.IsNullOrEmpty(knownRequestHeader))
                    {
                        this._url = new Uri(this._wr.GetProtocol() + "://" + knownRequestHeader + this.Path + queryStringText);
                    }
                }
                catch (UriFormatException)
                {
                }
            }
            if (this._url == null)
            {
                string serverName = this._wr.GetServerName();
                if ((serverName.IndexOf(':') >= 0) && (serverName[0] != '['))
                {
                    serverName = "[" + serverName + "]";
                }
                this._url = new Uri(this._wr.GetProtocol() + "://" + serverName + ":" + this._wr.GetLocalPortAsString() + this.Path + queryStringText);
            }
        }
        return this._url;
    }
}

You can see it is using the HttpEncoder class to do the decoding, but it uses the same QueryStringEncoding value.

Since I am already posting a lot of code here and anyone can get .NET Reflector, I'm going to snippet up the rest. The QueryString property comes from the HttpValueCollection which uses FillFromEncodedBytes method to eventually call HttpUtility.UrlDecode (with the QueryStringEncoding value set above), which eventually calls the HttpEncoder to decode it. They do seem to use different methodology to decode the actual bytes of the querystring, but the encoding they use to do it seems to be the same.

It is interesting to me that the HttpEncoder has so many functions that seem to do the same thing, so its possible there are differences in those methods which can cause an issue.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜