开发者

Problem in spliting columns in a row if it contains semicolon and double quote in text

I want to import few rows from csv file. The problem is there are few columns that contain semicolon and double quote in the middle of text.

And since my delimeter is ; and csv quote is ", so it splits the columns as soon as it sees ; and " in the middle of text.

My sample csv file is :

"hello";"<SPAN onmouseup="__doPostBack('bb','')">;</SPAN> <SPAN onmouseup="__doPostBack('j','')" style="DISPLAY: none" Enabled="true"> ";"bye"

The code to read the rows are:

csv.reader((line.replace('\0','') for line in f) , delimiter=';',quotechar = '"') 

 for row in reader:
     print row


 and it prints ;['hello', "<SPAN onmouseup=__doPostBack('bb','')>", '</SPAN> <SPAN onmouseup=开发者_运维百科"__doPostBack(\'j\',\'\')" style="DISPLAY: none" Enabled="true"> "', 'bye']

I want the result as :

 row[0] = hello
 row[1] = <SPAN onmouseup="__doPostBack('bb','')">;</SPAN> <SPAN onmouseup="__doPostBack('j','')" style="DISPLAY: none" Enabled="true"> 
 row[2] = bye

wheras the output i get is:

row[0] = hello
 row[1] = <SPAN onmouseup="__doPostBack('bb','')">
 row[2] = </SPAN> <SPAN onmouseup="__doPostBack('j','')" style="DISPLAY: none" Enabled="true"> 
 row[3] = bye

I have used the code " reader = csv.reader(open("yourfile.csv", "rb"), delimiter=';') " as defined in python split function, but still this code splits my rows into 4.

Any help will be appreciated.

Thank you..


I think you have a problem with the input format, which is not proper CSV:

Quoted from wikipedia:

Fields that contain a special character (comma, newline, or double quote), must be enclosed in double quotes. [...] If a field's value contains a double quote character it is escaped by placing another double quote character next to it

It seems that your input file misses out on escaping the double quotes.

That said, if you cannot do anything about the input, you will have to come up with some kind of pattern in the data that will allow you to fix the file before passing it to the cvs.reader, or you will have to parse it manually according to said patterns. This can get very complex very quickly.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜