How to get terminal's Character Encoding

2023-02-16 16:52 问答作者：

Now I change my gnome-terminal's character encoding to "GBK" (default i开发者_如何学Ct is UTF-8), but how can I get the value(character encoding) in my Linux?

The terminal uses environment variables to determine which character set to use, therefore you can determine it by looking at those variables:

echo $LC_CTYPE

echo $LANG

locale command with no arguments will print the values of all of the relevant environment variables except for LANGUAGE.

For current encoding:

locale charmap

For available locales:

locale -a

For available encodings:

locale -m

Check encoding and language:

$ echo $LC_CTYPE
ISO-8859-1
$ echo $LANG
pt_BR

Get all languages:

$ locale -a

Change to pt_PT.utf8:

$ export LC_ALL=pt_PT.utf8 
$ export LANG="$LC_ALL"

If you have Python:

python -c "import sys; print(sys.stdout.encoding)"

To my knowledge, no.

Circumstantial indications from $LC_CTYPE, locale and such might seem alluring, but these are completely separated from the encoding the terminal application (actually an emulator) happens to be using when displaying characters on the screen.

They only way to detect encoding for sure is to output something only present in the encoding, e.g. ä, take a screenshot, analyze that image and check if the output character is correct.

So no, it's not possible, sadly.

To see the current locale information use locale command. Below is an example on RHEL 7.8

[usr@host ~]$ locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

Examination of https://invisible-island.net/xterm/ctlseqs/ctlseqs.html, the xterm control character documentation, shows that it follows the ISO 2022 standard for character set switching. In particular ESC % G selects UTF-8. So to force the terminal to use UTF-8, this command would need to be sent. I find no way of querying which character set is currently in use, but there are ways of discovering if the terminal supports national replacement character sets.

However, from charsets(7), it doesn't look like GBK (or GB2312) is an encoding supported by ISO 2022 and xterm doesn't support it natively. So your best bet might be to use iconv to convert to UTF-8.

Further reading shows that a (significant) subset of GBK is EUC, which is a ISO2022 code, so ISO2022 capable terminals may be able to display GBK natively after all, but I can't find any mention of activating this programmatically, so the terminal's user interface would be the only recourse.

How to get terminal's Character Encoding

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？