Convert Unicode to UTF8
I am trying to mashup two different 3rd party services in javascript and I am getting strings in a certain character set, that I need to convert to a different character set in Javascript.
For example, the string is tést.
I am given an encoded string like this: te%u0301st. The accent is encoded as %u030开发者_Python百科1. I need to somehow convert this to this string: t%C3%A9st where the é is encoded as %C3%A9. How can I convert e%u0301 to %C3%A9 in javascript?
Thanks
You appear to be trying to normalize your input, probably in Unicode Normal Form C. I do not know of any simple way to do this in Javascript; you may need to implement the normalization algorithm yourself, or find a library which does so.
edited to remove answer to the wrong question
If all you need is any URL-escaped Unicode encoding, this will do the trick:
function convert(s) {
function parse(a, c) {
return String.fromCharCode(parseInt(c, 16));
}
return encodeURIComponent(s.replace(/%u([0-f]{4})/gi, parse));
}
convert('te%u0301st'); // => te%CC%81st
If you specifically need Normal Form C, you need to implement a whole lot of Unicode intelligence yourself, as 'te\u0301st'.length
(or 'tést'.length
) is 5 in javascript.
精彩评论