
Hive out-of-the-box json parser

I have a text file containing json records I would like to load to Hive. My json looks like:

{"vr":1,"tm":1312816191516,"tms":"08-08-2011 15:09:51.516 GMT","as":1002,"pb":1102,"cts":[1204,1205],"ctgs":[1304,1305],"op":1400,"ev":2,"dv":1503,"dvgs":开发者_如何学JAVA[1605,1606],"cnt":"cnt5","usr":"usr8","atts":[{"id":8002,"val":"ccc"},{"id":8003,"val":"ddd"}],"sel":{"cm":2102,"ty":"PRE","ag":3002,"ad":4002,"fl":5002,"fla":6002,"hg":7002,"mc":"WAP","pr":0.1}}

As you can see I have a nested json with arrays of primitives and array of objects.

Is it possible to load it as is to Hive using any built in function?


You should be able to load it into Hive as is. It's possible you may need to escape the "s. I haven't loaded JSON into hive, so not 100% if any escaping needs to be done.

To access the JSON elements once it is in hive; Hive has a built in function for doinh so. get_json_object, which can be seen in details at https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-getjsonobject

You can use a custom serde to read json files to hive tables. See the following serde on github - https://github.com/rcongiu/Hive-JSON-Serde

Also checkout the brickhouse - https://github.com/klout/brickhouse. They have quite decent UDF's for json (like json_split and json_map). With brickhouse and get_json_object / json_tuple (also mentioned by Nija here) you can even avoid using custom SerDe's, like Hive-JSON-Serde.





验证码 换一张
取 消

