开发者

java - How to find whether the url is utf-8 or utf-16

How to find whether the URL is UTF-8 or UTF-16 in Ja开发者_JS百科va?

For example, this URL is UTF-8.


XML messages specify the encoding type.

<?xml version="1.0" encoding="UTF-8"?>

<?xml version="1.0" encoding="UTF-16"?>


As described in the other answers, there are two ways to specify the encoding of a document returned via HTTP:

  • as part of the Content-Type header field
  • encoding declaration inside the XML file (e.g. <?xml version="1.0" encoding="UTF-8"?>)

However, both of these are optional. According to the HTTP spec, the encoding defaults to ISO 8859-1 is not specified. With an XML file, if the file is supplied with an HTTP Content-Typ header, this is the correct encoding. Otherwise, the default is UTF-8 or UTF-16 (depending on the presence of a byte order mark (BOM).

So if you know the content is in UTF-8 or in UTF-16, check for the BOM. If it's there, it's UTF-16, otherwise UTF-8. See e.g. http://www.opentag.com/xfaq_enc.htm#enc_default for an explanation.


I'm assuming you're after the encoding of the representation of the resource addressed by this URL.

A resource at a given URI may have multiple representations. Thus, you can't really know generally in advance the content type and encoding of the representation you get until you actually get it. Using the HTTP HEAD method can give you some indication as to which content types and encodings the server is willing to offer. This will also vary depending on the headers your client sends (Accept: ...). If you want to learn more about this, look for "Content-type negotiation".

Doing a HEAD or GET request should return a Content-Type header with the appropriate charset field. If no content-type negotiation takes place on this server (which is often the case), this will not vary.

If you're using HttpURLConnection in Java, you can see the headers using getHeaderFieldKey and getHeaderField.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜