python: json.dumps can't handle utf-8?
Below is the test program, including a Chinese character:
# -*- coding: utf-8 -*-
import json
j = {"d":"中", "e":"a"}
json = json.dumps(j, encoding="utf-8")
print json
Below is the result, look the json.dumps convert the utf-8 to the original numbers!
{"e": "a", "d": "\u4e2d"}
Why this is broken? Or anything I am开发者_如何学编程 wrong?
Looks like valid JSON to me. If you want json
to output a string that has non-ASCII characters in it then you need to pass ensure_ascii=False
and then encode manually afterward.
You should read json.org. The complete JSON specification is in the white box on the right.
There is nothing wrong with the generated JSON. Generators are allowed to genereate either UTF-8 strings or plain ASCII strings, where characters are escaped with the \uXXXX
notation. In your case, the Python json
module decided for escaping, and 中
has the escaped notation \u4e2d
.
By the way: Any conforming JSON interpreter will correctly unescape this sequence again and give you back the actual character.
Use simplejson with the mentioned options:
# -*- coding: utf-8 -*-
import simplejson as json
j = {"d":"中", "e":"a"}
json = json.dumps(j, ensure_ascii=False, encoding="utf-8")
print json
Outs:
{"e": "a", "d": "中"}
精彩评论