开发者

Removing hexadecimal UTF-8 characters in java

I know this question has been asked before, but none of the solutions seemed to work for this particular problem. My Java application receives a username from another server. The username sometimes contains the hexadecimal representation of UTF-8 characters.

For example: "Féçon" comes in as F\C3\A9\C3\A7on.

None of the examples I found on this site (most of them use "getBytes") worked. No idea why.

So my question is: if you have defined a St开发者_StackOverflow中文版ring with these characters, how can you remove them so it looks right again? You can try it yourself by using the following:

String test = "F\C3\A9\C3\A7on"

thanks! Mike


It's not the most performant solution, but at least the code is short.... You're basically URL decoding, where \ indicates an encoded character instead of %. So the following code works:

String s = "F\\C3\\A9\\C3\\A7on";
s = s.replace('\\', '%');
System.out.println(URLDecoder.decode(s, "UTF-8"));


In this case getBytes won't work because it sounds like your Java string doesn't contain any Unicode characters; it just contains fifteen regular ASCII characters that represent the escape sequence of the unicode characters. It's likely that whatever your upstream component is, it's responsible for the escaping.

So easiest way to address this is to see if the "other end" can be persuaded to speak Unicode. If so, you'll get the characters directly in Java and Bob's your uncle.

Otherwise, you'll need to find some way of decoding these Strings. The simplest way I can think of is to iterate through, manually converting to chars and concatenating, something like this:

StringBuilder result = new StringBuilder();
char[] input = inputStr.toCharArray();
for (int i = 0; i < input.length; i++)
{
   switch (input[i])
   {
      case '\\':
         // Get the next two characters and turn it into a literal char
         String escapeCodeStr = input[i+1] + input[i+2];
         char escapedChar = (char)Integer.parseInt(escapeCodeStr, 16);
         result.append(escapedChar);
         i += 2; // Move pointer to account for two extra characters read
         break;

      default:
         result.append(input[i]);
   }
}

return result.toString();

This hasn't been tested, but it illustrates the principle of turning the escape codes into literal characters.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜