java - How to find whether the url is utf-8 or utf-16
How to find whether the URL is UTF-8 or UTF-16 in Ja开发者_JS百科va?
For example, this URL is UTF-8.
XML messages specify the encoding type.
<?xml version="1.0" encoding="UTF-8"?>
<?xml version="1.0" encoding="UTF-16"?>
As described in the other answers, there are two ways to specify the encoding of a document returned via HTTP:
- as part of the
Content-Type
header field - encoding declaration inside the XML file (e.g.
<?xml version="1.0" encoding="UTF-8"?>
)
However, both of these are optional. According to the HTTP spec, the encoding defaults to ISO 8859-1 is not specified. With an XML file, if the file is supplied with an HTTP Content-Typ header, this is the correct encoding. Otherwise, the default is UTF-8 or UTF-16 (depending on the presence of a byte order mark (BOM).
So if you know the content is in UTF-8 or in UTF-16, check for the BOM. If it's there, it's UTF-16, otherwise UTF-8. See e.g. http://www.opentag.com/xfaq_enc.htm#enc_default for an explanation.
I'm assuming you're after the encoding of the representation of the resource addressed by this URL.
A resource at a given URI may have multiple representations. Thus, you can't really know generally in advance the content type and encoding of the representation you get until you actually get it. Using the HTTP HEAD
method can give you some indication as to which content types and encodings the server is willing to offer. This will also vary depending on the headers your client sends (Accept: ...
).
If you want to learn more about this, look for "Content-type negotiation".
Doing a HEAD
or GET
request should return a Content-Type
header with the appropriate charset
field. If no content-type negotiation takes place on this server (which is often the case), this will not vary.
If you're using HttpURLConnection
in Java, you can see the headers using getHeaderFieldKey
and getHeaderField
.
精彩评论