Python: Which encoding is used for processing sys.argv?

2023-01-21 01:56 问答作者：

In what encoding are the elements of sys.argv, in Python? are they encoded with the sys.getdefaultencoding() encoding?

sys.getdefaultencoding(): Return the name of the current default string encoding used by the Unicode implementat开发者_C百科ion.

PS: As pointed out in some of the answers, sys.stdin.encoding would indeed be a better guess. I would love to see a definitive answer to this question, though, with pointers to solid sources!

PPS: As Wim pointed out, Python 3 solves this issue by putting str objects in sys.argv (if I understand correctly). The question remains open for Python 2.x, though. Under Unix, the LC_CTYPE environment variable seems to be the correct thing to check, no? What should be done with Windows (so that sys.argv elements are correctly interpreted whatever the console)?

I'm guessing that you are asking this because you ran into issue 2128. Note that this has been fixed in Python 3.0.

A few observations:

(1) It's certainly not sys.getdefaultencoding.

(2) sys.stdin.encoding appears to be a much better bet.

(3) On Windows, the actual value of sys.stdin.encoding will vary, depending on what software is providing the stdio. IDLE will use the system "ANSI" code page, e.g. cp1252 in most of Western Europe and America and former colonies thereof. However in the Command Prompt window, which emulates MS-DOS more or less, the corresponding old DOS code page (e.g. cp850) will be used by default. This can be changed by using the CHCP (change code page) command.

(4) The documentation for the subprocess module doesn't provide any suggestions on what encoding to use for args and stdout.

(5) One trusts that assert sys.stdin.encoding == sys.stdout.encoding never fails.

I don't know if this helps or not but this is what I get in DOS mode:

C:\Python27>python Lib\codingtest.py нер
['Lib\\codingtest.py', '\xed\xe5\xf0']

C:\Python27>python Lib\codingtest.py hello
['Lib\\codingtest.py', 'hello']

In IDLE:

>>> print "hello"
hello
>>> "hello"
'hello'
>>> "привет"
'\xef\xf0\xe8\xe2\xe5\xf2'
>>> print "привет"
привет
>>> sys.getdefaultencoding()
'ascii'
>>>

What can we deduce from this? I don't know yet... I'll comment in a little bit.

A little bit later: sys.argv is encoded with sys.stdin.encoding and not sys.getdefaultencoding()

"What should be done with Windows (so that sys.argv elements are correctly interpreted whatever the console)?"

For Python 2.x, see this comment on issue2128.

(Note that no encoding is correct for the original sys.argv, because some characters may have been mangled in ways that there is not enough information to undo; for example, if the ANSI codepage cannot represent Greek alpha then it will be mangled to 'a'.)

On Unix systems, it should be in the user's locale, which is (strangely) not tied to sys.getdefaultencoding. See http://docs.python.org/library/locale.html.

In Windows, it'll be in the system ANSI codepage.

(By the way, those elementary school teachers who told you not to end a sentence with a preposition were lying to you.)

As per https://docs.python.org/3/library/sys.html#sys.argv

argv is encoded with sys.getfilesystemencoding() using sys.getfilesystemencodeerrors().

See also https://www.python.org/dev/peps/pep-0383/ which explains the tricky way of how non-UTF8 sequences are encoded within that (UTF-8), when encoding="utf-8" ... by using surrogateescape as error handler.

Of intrest might also be os.fsdecode and os.fsencode.

sys.getfilesystemencoding() works for me, at least on Windows. On Windows it is actually 'mbcs', and 'utf-8' on *nix.

继续阅读：argv encoding python sys

Python: Which encoding is used for processing sys.argv?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？