开发者

Hive out-of-the-box json parser

I have a text file containing json records I would like to load to Hive. My json looks like:

{"vr":1,"tm":1312816191516,"tms":"08-08-2011 15:09:51.516 GMT","as":1002,"pb":1102,"cts":[1204,1205],"ctgs":[1304,1305],"op":1400,"ev":2,"dv":1503,"dvgs":开发者_如何学JAVA[1605,1606],"cnt":"cnt5","usr":"usr8","atts":[{"id":8002,"val":"ccc"},{"id":8003,"val":"ddd"}],"sel":{"cm":2102,"ty":"PRE","ag":3002,"ad":4002,"fl":5002,"fla":6002,"hg":7002,"mc":"WAP","pr":0.1}}

As you can see I have a nested json with arrays of primitives and array of objects.

Is it possible to load it as is to Hive using any built in function?

Yosi


You should be able to load it into Hive as is. It's possible you may need to escape the "s. I haven't loaded JSON into hive, so not 100% if any escaping needs to be done.

To access the JSON elements once it is in hive; Hive has a built in function for doinh so. get_json_object, which can be seen in details at https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-getjsonobject


You can use a custom serde to read json files to hive tables. See the following serde on github - https://github.com/rcongiu/Hive-JSON-Serde


Also checkout the brickhouse - https://github.com/klout/brickhouse. They have quite decent UDF's for json (like json_split and json_map). With brickhouse and get_json_object / json_tuple (also mentioned by Nija here) you can even avoid using custom SerDe's, like Hive-JSON-Serde.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜