开发者

Python: JSON decoding library which can associate decoded items with original line number?

I often use JSON for user-editable configuration files. Malformed JSON is of course picked up for me by json.loads, but sometimes there are errors which I don't find until I'm going through the resulting dicts/lists/strings. I would like to be able to give helpful errors like "Invalid value 'foo' on开发者_运维知识库 line 23", but when I get my dict back I've lost any mapping to original line numbers.

It seems possible that someone might have written a JSON parser which tagged each output object with some metadata about where it appeared in the input text: does such a thing exist for python?

Example:

1. [{"foo": "x"},
2.  {"bar": "y"}]

After parsing the above, I find that "y" is actually not a legal value for "bar", and I'd like to know that it came from line number 2.


AFAIK what you want doesn't exist, but I have an idea how you could implement it if you're interested...

The json module has a hook for decoding objects which you could (mis-) use to do decode-time object validation. However this won't solve your problem because the hook doesn't get line number information. The issue is further complicated because you no longer get line-by-line error messages in Python 2.7+. You only get them from the pure Python JSON decoder, and newer versions use a (much faster) C library.

So we've got two problems to solve.

1) You can use the pure-python decoder by subclassing json.JSONDecoder like so:

class PyDecoder(json.JSONDecoder):
    def __init__(self, encoding=None, object_hook=None, parse_float=None,
            parse_int=None, parse_constant=None, strict=True,
            object_pairs_hook=None):
        super(PyDecoder, self).__init__(encoding, object_hook, parse_float,
                                        parse_int, parse_constant, strict)
        self.scan_once = json.scanner.py_make_scanner(self)

2) To get your validation you need to replace json.decoder.JSONObject with a method that does pretty much the same thing, but also passes line number information to your validation routine.


Full disclosure: I'm the maintainer of the package below.

There is now a new Python package that solves this use case: https://github.com/open-alchemy/json-source-map

Installation: pip install json_source_map

For example, in your case:

from json_source_map import calculate

source = '''[{"foo": "x"},
{"bar": "y"}]'''

print(calculate(source))

This prints:

{
    "": Entry(
        value_start=Location(line=0, column=0, position=0),
        value_end=Location(line=1, column=13, position=28),
        key_start=None,
        key_end=None,
    ),
    "/0": Entry(
        value_start=Location(line=0, column=1, position=1),
        value_end=Location(line=0, column=13, position=13),
        key_start=None,
        key_end=None,
    ),
    "/0/foo": Entry(
        value_start=Location(line=0, column=9, position=9),
        value_end=Location(line=0, column=12, position=12),
        key_start=Location(line=0, column=2, position=2),
        key_end=Location(line=0, column=7, position=7),
    ),
    "/1": Entry(
        value_start=Location(line=1, column=0, position=15),
        value_end=Location(line=1, column=12, position=27),
        key_start=None,
        key_end=None,
    ),
    "/1/bar": Entry(
        value_start=Location(line=1, column=8, position=23),
        value_end=Location(line=1, column=11, position=26),
        key_start=Location(line=1, column=1, position=16),
        key_end=Location(line=1, column=6, position=21),
    ),
}

This tells you exactly where in the original JSON document each key and value starts and ends. For example, it tells you that "y" is on line 1 (lines are zero-indexed) and starts at column 8 and ends at column 11.


If you use json.load() and pass in an open file handle, any error message you get will have a line and column number. If the exception is a ValueError, then the associated message should be suitable for forwarding to the user.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜