开发者

Difficulty determining the file type of text database file

So the USDA has some weird database of general nutrition facts about food, and well naturally we're going to steal it for use in our app. But anyhow the format of the lines is like the following:

~01001~^~0100~^~Butter, salted~^~BUTTER,WITH SALT~^~~^~~^~Y~^~~^0^~~^6.38^4.27^8.79^3.87
~01002~^~0100~^~Butter, whipped, with salt~^~BUTTER,WHIPPED,WITH SALT~^~~^~~^~Y~^~~^0^~~^6.38^4.开发者_StackOverflow27^8.79^3.87
~01003~^~0100~^~Butter oil, anhydrous~^~BUTTER OIL,ANHYDROUS~^~~^~~^~Y~^~~^0^~~^6.38^4.27^8.79^3.87
~01004~^~0100~^~Cheese, blue~^~CHEESE,BLUE~^~~^~~^~Y~^~~^0^~~^6.38^4.27^8.79^3.87

With those odd ~ and ^ separating the values, It also lacks a header row but thats ok, I can figure that out from the other stuff on their site: http://www.ars.usda.gov/Services/docs.htm?docid=8964

Any help would be great! If it matters we're making an open/free API with Ruby to query this data.

Additionally I'm having a tough time posing this question so I've made it a community wiki so we can all pitch in!


This looks like a very standard CSV (comma separated value) file, except the field separator character was changed from , to ^ and quote character from " to ~

Unfortunately, I'm not familiar with Ruby to recommend which library to use, but in Perl there's a boatload of standard CPAN modules the best of which allow you to configure both field separator and quote character of a CSV reader... I would expect Ruby should have something similar as well - if so, you're in luck!


^ appears to be a field delimiter and ~ a string delimiter. Normally I'd expect to see , and " in those roles, but the choice of the very uncommon characters means that a string like

Cheese, Bleu

won't get all trippy with the string parser.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜