开发者

Parsing JSON without quoted keys

I understand that in JSON, keys are supposed to be surrounded in double quotes. However, I'm using a data source which doesn't quote them, which is causing the Ruby JSON parser to raise an error. Is there any way to perform 'non-strict' parsing?

Example:

>> JSON.parse('{name:"hello", age:"23"}')
JSON::ParserError: 618: unexpected token at '{name:"hello", age:"23"}'
    from /Library/Ruby/Gems/1.8/gems/json-1.1.7/lib/json/common.rb:122:in `parse' 
    from /Library/Ruby/Gems/1.8/gems/json-1.1.7/lib/json/common.rb:122:in `parse'
    from (irb):5
>>开发者_Go百科 JSON.parse('{"name":"hello", "age":"23"}')
=> {"name"=>"hello", "age"=>"23"}
>> 

(I tried using a regular expression to add the quotes in before parsing but couldn't get it fully working).


If the data is pretty well formed other than that, a simple regex might do it:

irb(main):009:0> '{name:"hello", age:"23"}'.gsub(/([a-z]+):/, '"\1":')
=> "{\"name\":\"hello\", \"age\":\"23\"}"


I have this same issue with a third party data feed, but mine returns a more complicated JSON-like response which the gsub solutions don't handle. After some research it appears these data feeds are actually JavaScript object literals which don't require the keys to be quoted.

To resolve the issue I added the execjs gem and installed node.js (therubyracer gem would probably work as well). Once complete, the following returns a correctly parsed ruby hash.

ExecJS.eval('{name:"hello", age:"23"}')
 => {"name"=>"hello", "age"=>"23"}


Interestingly, your example is valid ruby 1.9 Hash syntax. If your data is really as simple as this (no spaces or other special characters in the key names), and you can process it in a safe context, you can just eval it.

irb(main):001:0> eval '{name:"hello", age:"23"}'
=> {:name=>"hello", :age=>"23"}

This gives you symbols as keys, so post-process if you need to turn them into strings:

irb(main):002:0> eval('{name:"hello", age:"23"}').reduce({}) {|h,(k,v)| h[k.to_s] = v; h}
=> {"name"=>"hello", "age"=>"23"}


gsub(/(\w+)\s*:/, '"\1":')

worked better than

gsub(/([a-z]+):/, '"\1":')

If it had spaces or capital letters, it failed.


(Answering my own question) The snippet that floyd posted was similar to what I tried - it was failing because some of my strings contain colons. But I persisted and found a solution:

gsub(/([\{|\,}])\s*([a-zA-Z]+):/, '\1 "\2":')


This is how I have had to solve it:

JSON.parse(broken_json_string.gsub(/'([^']+)':/, '"\1":'))

Some of the above assumes the keys only contain letters; some of ours contained underscores, spaces, etc. Easier to just say "any character that isn't a single quote" (given, in our case, all the keys were wrapped in single quotes).

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜