开发者

Handle commas in quoted strings in Tcl

I'm using the following line in Tcl to parse a comma-separated line of fields. Some of the fields may b开发者_如何学Pythone quoted so they can contain comma's:

set line {12,"34","56"}
set fresult [regsub -all {(\")([^\"]+)(\",)|([^,\"]+),} $line {{\2\4} } fields]
puts $fields
{12} {34} "56"

(It's a bit strange that the last field is quoted instead of braced but that's not the problem here)

However, when there is a comma in the quote, it does not work:

set line {12,"34","56,78"}
set fresult [regsub -all {(\")([^\"]+)(\",)|([^,\"]+),} $line {{\2\4} } fields]
puts $fields
{12} {34} "{56} 78"

I would expect: {12} {34} {56,78}

Is there something wrong with my regexp or it there something tcl-ish going on?


One option that comes to mind is using the CSV functionality in TclLib. (No reason to reinvent the wheel unless you have to...)

http://tcllib.sourceforge.net/doc/csv.html

Docs Excerpt

::csv::split ? -alternate ? line {sepChar ,} {delChar "} converts a line in CSV format into a list of the values contained in the line. The character used to separate the values from each other can be defined by the caller, via sepChar, but this is optional. The default is ",". The quoting character can be defined by the caller, but this is optional. The default is '"'. If the option -alternate is spcified a slightly different syntax is used to parse the input. This syntax is explained below, in the section FORMAT.


The problem seems to be an extra comma: you only accept quoted strings if they have a comma after them., and do the same for non-quoted tokens, This works:

set fresult [regsub -all {(\")([^\"]+)(\")|([^,\"]+)} $line {{\2\4} } fields]
                                        ^(no commas)^

Working Example: http://ideone.com/O2hss

You can safely keep the commas out of the pattern - the regex engine will keen searching new matches: it will skip a comma it cannot match, and start at the next character.

Bonus: this will also handle escaped quotes, using \" (if you need you should be able to adapt easily by using "" instead of \\. ).:

set fresult [regsub -all {"((?:[^"\\]|\\.)+)"|([^,"]+)} $line {{\1\2} } fields]

Example: http://ideone.com/ztkBh


Use the following regsub

% set line {12,"34","56,78"}

% regsub -all {(,")|(",)|"} $line " " line

% set line

12 34  56,78  <<< Result

Here all the occurrences of ," or ", or " (in order) are replaced by space


As you said to @Kobi, if you allow for empty fields, you should allow for empty strings "" {((\")([^\"]*)(\")|([^,\"]*))(,|$)} where the fields of interest shifted to 3 and 5

Expanded: { ( (\")([^\"]*)(\") | ([^,\"]*) ) (,|$) } I admit, I don't know if tcl allows (?:) non-capture grouping.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜