Hive out-of-the-box json parser
I have a text file containing json records I would like to load to Hive. My json looks like:
{"vr":1,"tm":1312816191516,"tms":"08-08-2011 15:09:51.516 GMT","as":1002,"pb":1102,"cts":[1204,1205],"ctgs":[1304,1305],"op":1400,"ev":2,"dv":1503,"dvgs":开发者_如何学JAVA[1605,1606],"cnt":"cnt5","usr":"usr8","atts":[{"id":8002,"val":"ccc"},{"id":8003,"val":"ddd"}],"sel":{"cm":2102,"ty":"PRE","ag":3002,"ad":4002,"fl":5002,"fla":6002,"hg":7002,"mc":"WAP","pr":0.1}}
As you can see I have a nested json with arrays of primitives and array of objects.
Is it possible to load it as is to Hive using any built in function?
Yosi
You should be able to load it into Hive as is.
It's possible you may need to escape the "
s. I haven't loaded JSON into hive, so not 100% if any escaping needs to be done.
To access the JSON elements once it is in hive; Hive has a built in function for doinh so. get_json_object
, which can be seen in details at
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-getjsonobject
You can use a custom serde to read json files to hive tables. See the following serde on github - https://github.com/rcongiu/Hive-JSON-Serde
Also checkout the brickhouse - https://github.com/klout/brickhouse. They have quite decent UDF's for json (like json_split and json_map). With brickhouse and get_json_object / json_tuple (also mentioned by Nija here) you can even avoid using custom SerDe's, like Hive-JSON-Serde.
精彩评论