Hive - How can I write a create statement for a variable length, existing, hdfs file?
So, I have an ex开发者_JAVA百科isting hdfs directory, containing a bunch of files. These files are all tab delimited.
I have a hive statement....
create external table
mytable(
key string,
name string,
address string,
ssn string)
row format delimited fields
terminated by '09', lines terminted by '10'
STORED AS TEXTFILE location '/MyHiveFiles/data';
This works pretty well, except for all of the extra fields. The file also contains between 0 and x extra data elements after the ssn field. They are still tab delimited, and '\n' record delimited. I could add a bunch of 'valuex string' (where x is the increment of extra elements)... but I don't know how many there might eventually be, and that seems messy anyway.
Is there a way to tell hive to just put all the remaining fields of that row into ONE field, like 'others string'? Even if it is tab delimted in the hive return value... I am ok with that.
Thanks, in advance.
Creating a table in Hive essentially just creates the Metadata telling hive how to interpret the files. Hive doesn't 'know' about the rest of the data.
If you add another column as an array and specify COLLECTION ITEMS TERMINATED BY '\0002'
(\0002 or some other character) then the tabs will not terminate the array collection and should all be returned as a single element, including tabs. Haven't tested this yet. :)
精彩评论