开发者

Illegal character - CTRL-CHAR

I am getting following exception from webservices:

com.ctc.wstx.exc.WstxUnexpectedCharException: Illegal character ((CTRL-CHAR, code 15))

I know the reason behind this, I am getting "Control 开发者_开发百科Characters" in data I want to return. And in XML CTRL-CHAR are not allowed.

I searched for the solution, and many places I found the code to remove CTRL-CHAR.

The concern is shall I end up loss of data if I remove control characters from data?

I want the clean solution may encoding, instead of removing control char.


I would do what OrangeDog suggest. But if you want to solve it in your code try:

replaceAll("[\\x00-\\x09\\x11\\x12\\x14-\\x1F\\x7F]", "")

\\x12 is the char.


Thanks guys for you inputs. I am sharing solution might be helpful for others. The requirement was not to wipe out CONTROL CHAR, it should remain as it is in DB also and one WS sends it across n/w client should able to get the CONTROL CHAR. So I implemented the code as follow:

  1. Encode strings using URLEncoder in Web-Service code.
  2. At client Side decode it using URLDecoder

Sharing sample code and output bellow.
Sample code:

System.out.println("NewSfn");  
System.out.println(URLEncoder.encode("NewSfn", "UTF-8"));  
System.out.println(URLDecoder.decode("NewSfn", "UTF-8"));  

Output:

NewSfn  
New%0FSfn  
NewSfn 

So client will recieve CONTROL CHARs.

EDIT: Stack Exchange is not showing CONTROL CHAR above. NewSfn is like this New(CONTROL CHAR)Sfn.


This error is being thrown by the Woodstox XML parser. The source code from the InputBootstrapper class looks like this:

protected void reportUnexpectedChar(int i, String msg)
    throws WstxException
{
    char c = (char) i;
    String excMsg;

    // WTF? JDK thinks null char is just fine as?!
    if (Character.isISOControl(c)) {
        excMsg = "Unexpected character (CTRL-CHAR, code "+i+")"+msg;
    } else {
        excMsg = "Unexpected character '"+c+"' (code "+i+")"+msg;
    }
    Location loc = getLocation();
    throw new WstxUnexpectedCharException(excMsg, loc, c);
}

Amusing comment aside, the Woodstox is performing some additional validation on top of the JDK parser, and is rejecting the ASCII character 15 as invalid.

As to why that character is there, we can't tell you that, it's in your data. Similarly, we can't tell you if removing that character will break anything, since again, it's your data. You can only establish that for yourself.


If you have control characters in your text data then you need to solve that problem at its source.

The most likely causes are incorrect communication encodings (usually between database and app) or not sanitising user input.


I found the same problem when I was passing null values for some of the parameters. I passed empty or wrench values instead and this error went away.


I'm a bit confused by @ssedano's anwser, it seems to me he's trying to find all control chars from ASCII table 0x00 to 0x1F except for 0x0A (new line) and 0x0D (carriage return) plus 0x7F (del), then wouldn't ther regex be

replaceAll("[\\x00-\\x09\\x0B\\x0C\\x0E-\\x1F\\x7F]", "")
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜