开发者

repair data in csv file

I have a huge csv file, separated by comma's and I want to do a analysis with glm in R. In one column there exists data with a comma implied, something like: bla,blabla When reading the file in R with read.csv.sql there comes a error-message: RS-DBI driver: (RS_sqlite_import: ./agp.csv line 47612 expected 37 columns of data but found 38) This is due to the 'extra' comma in some of the data, not the whole column has an extra column. How can I fix this? I want to remove this extra 开发者_运维百科superfluous comma. Thanks for the reaction, André


The CSV format is very simple and can easily be hand edited. In order to include a comma in a value, you must surround the value with quotes quotes. Try this: "bla,blabla". If that data happens to contain any quotes, eg. blah,"thequotedblah",blah, those quotes need to be escaped with another quote, like this: "blah,""thequotedblah"",blah".

Although there is no official standard around it, there isn't much to the CSV format. Wikipedia has a great CSV reference that I have personally used to implement CSV support in applications. Spend 5-10 minutes reading it and you'll know everything you ever need to know to manually create/read/repair CSV data.


Is it just this one line that contains a non-quoted comma - or are there several such lines? Editing the .csv with an editor that can handle large files (e.g. Ultraedit) to sanitize that one record would certainly help. Asaph's suggestion of quoting is also a good 'un.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜