Python: JSON decoding library which can associate decoded items with original line number?
I often use JSON for user-editable configuration files. Malformed JSON is of course picked up for me by json.loads
, but sometimes there are errors which I don't find until I'm going through the resulting dicts/lists/strings. I would like to be able to give helpful errors like "Invalid value 'foo' on开发者_运维知识库 line 23", but when I get my dict back I've lost any mapping to original line numbers.
It seems possible that someone might have written a JSON parser which tagged each output object with some metadata about where it appeared in the input text: does such a thing exist for python?
Example:
1. [{"foo": "x"},
2. {"bar": "y"}]
After parsing the above, I find that "y" is actually not a legal value for "bar", and I'd like to know that it came from line number 2.
AFAIK what you want doesn't exist, but I have an idea how you could implement it if you're interested...
The json module has a hook for decoding objects which you could (mis-) use to do decode-time object validation. However this won't solve your problem because the hook doesn't get line number information. The issue is further complicated because you no longer get line-by-line error messages in Python 2.7+. You only get them from the pure Python JSON decoder, and newer versions use a (much faster) C library.
So we've got two problems to solve.
1) You can use the pure-python decoder by subclassing json.JSONDecoder like so:
class PyDecoder(json.JSONDecoder):
def __init__(self, encoding=None, object_hook=None, parse_float=None,
parse_int=None, parse_constant=None, strict=True,
object_pairs_hook=None):
super(PyDecoder, self).__init__(encoding, object_hook, parse_float,
parse_int, parse_constant, strict)
self.scan_once = json.scanner.py_make_scanner(self)
2) To get your validation you need to replace json.decoder.JSONObject with a method that does pretty much the same thing, but also passes line number information to your validation routine.
Full disclosure: I'm the maintainer of the package below.
There is now a new Python package that solves this use case: https://github.com/open-alchemy/json-source-map
Installation: pip install json_source_map
For example, in your case:
from json_source_map import calculate
source = '''[{"foo": "x"},
{"bar": "y"}]'''
print(calculate(source))
This prints:
{
"": Entry(
value_start=Location(line=0, column=0, position=0),
value_end=Location(line=1, column=13, position=28),
key_start=None,
key_end=None,
),
"/0": Entry(
value_start=Location(line=0, column=1, position=1),
value_end=Location(line=0, column=13, position=13),
key_start=None,
key_end=None,
),
"/0/foo": Entry(
value_start=Location(line=0, column=9, position=9),
value_end=Location(line=0, column=12, position=12),
key_start=Location(line=0, column=2, position=2),
key_end=Location(line=0, column=7, position=7),
),
"/1": Entry(
value_start=Location(line=1, column=0, position=15),
value_end=Location(line=1, column=12, position=27),
key_start=None,
key_end=None,
),
"/1/bar": Entry(
value_start=Location(line=1, column=8, position=23),
value_end=Location(line=1, column=11, position=26),
key_start=Location(line=1, column=1, position=16),
key_end=Location(line=1, column=6, position=21),
),
}
This tells you exactly where in the original JSON document each key and value starts and ends. For example, it tells you that "y"
is on line 1 (lines are zero-indexed) and starts at column 8 and ends at column 11.
If you use json.load()
and pass in an open file handle, any error message you get will have a line and column number. If the exception is a ValueError
, then the associated message should be suitable for forwarding to the user.
精彩评论