开发者

Regarding Java Split Command CSV File Parsing

I have a csv file in the below format. I get an issue if either one of the beow csv data is read by the program

"D",abc"def,"","0429"292"0","11","IJ80","Feb10_1.txt-2","FILE RECORD","05/02/2010","04/03/2010","","1","-91","",""


"D","abc"def","","04292920","11","IJ80","Feb10_1.txt-2","FILE RECORD","05/02/2010","04/03/2010","","1","-91","",""

The below split command is used to ignore the commas inside the double quotes i got the below split command from an earlier post. Pasted开发者_运维知识库 the URL that i took this command

String items[] = line.split(",(?=([^\"]\"[^\"]\")[^\"]$)",15); System.out.println("items.length"+items.length);

Regarding Java Split Command Parsing Csv File

The items.length is printed as 14 instead of 15. The abc"def is not recognized as a individual field and it's getting incorrectly stored as "D",abc"def in items[0]. . I want it to be stored in the below way

items[0] should be "D" and items[1] should be abc"def

The same issue happens when there is a value "abc"def". I want it to be stored as

items[0] should be "D" and items[1] should be "abc"def"

Also this split command works perfectly if the double quotes repeated inside the double quotes( field value is D,"abc""def",1 ).

How can i resolve this issue.


I think you would be much better off writing a parser to parse the CSV files rather than try to use a regular expression. Once you start dealing with CSV files with carriage returns within the lines, then the Regex will probably fall apart. It wouldn't take that much code to write a simple while loop that went through all the characters and split up the data. It would be lot easier to deal with "Non-Standard"* CSV files such as yours when you have a parser rather than a Regex.

*I say non-standard because there isn't really an official standard for CSV, and when you're dealing with CSV files from many different systems, you see lots of weird things, like the abc"def field as shown above.


opencsv is a great simple and light weight CSV parser for Java. It will easily handle your data.


If possible, changing your CSV format would make the solution very simple.

See the following for an overview of Delimiter Separated Values, a common format on Unix-based systems:

http://www.faqs.org/docs/artu/ch05s02.html#id2901882


Opencsv is very simple and best API for CSV parsing . This can be done with Linux SED commands prior processing it in java . If File is not in proper format convert it into proper delimited which is your (" , " ) into pipe or other unique delimiter , so inside field value and column delimiter can be differentiated easily by Opencsv.Use the power of linux with your java code.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜