Unicode and console interpretation
I print to the standard output some characters from a wide UTF-8 range in a Java application. My console is configured for UTF-8 support. My problem is that sometimes, when I decide to print 10 characters for example, I see a number of character which is less then 10.
I think this is due to the console which interprets some characters. Are there some unicode character which can be interpreted like: erase th开发者_运维技巧e previous character ? Is it possible to exclude them from the ouput (what are the codepoints of these characters)?
Using carriage return or the backspace character you can get results like you describe. This little test program for instance...
public class Test {
public static void main(String... args) {
System.out.println("abc\rdef\u0008g");
}
}
...prints in my terminal (ubuntu)
$ java Test
deg
$
\r
is carriage return, and \u0008
represents the backspace character. (Carriage return sends the cursor back to the first column, and backspace sends it back one column.)
To remove all these, so called "control characters" you could do:
myString = myString.replaceAll("\\p{Cntrl}", "");
from the docs:
\p{Cntrl}
A control character: [\x00-\x1F\x7F]
Obvious one is backspace
精彩评论