开发者

Regex - replace a characters(comma) between double quotes if the pattern is matched

I have this log from Exchange server

2010-05-20T01:53:33.097Z,12.10.53.144,,12.10.53.200,EXHUB-10,08CCC3F50C35F2D2;2010-05-20T01:53:32.128Z;0,EXHUB-10\Default EXHUB-10,SMTP,RECEIVE,829888,,norma@ccc.gov.my,,521647,1,,,"NEAC Sub-Working Group Meeting - Upgrade Skills of the Labour Force's and Enhance Vocational and Technical Training- 2:30 pm Monday May 24, 2010",lee.cheesung@gmail.com,<>,00A:

and i used this regex to match and group the pattern;

(\d{4}-\d{2}-\d{2})(?:[\w\s]+)(\d+:\d+:\d+.\d+)(?:[\w+\d.]*),(.*?),(.*?),(.*?),(.*?),(.*?),(.*?),(.*?),(.*?),(.*?),(.*?),(.*?),(['"].*['"]|.*?),(.*?),(.*?),(.*?),(.*?),(.*?),(.*?),(.*?),(?:(\d{4}-\d{2}-\d{2}\w\d{2}:\d{2}:\d{2}.\d+)(?:\w+)*)*(.*)

Basically, the information in the log is separated by the comma. Unfortunately, for the 'email subject' field, if the user enter the comma, the log will appear in double quote such as the above example - comma in the date format "Monday May 24, 2010"

.....521647,1,,,"NEAC Sub-Working Group Meeting - Upgrade Skills of the Labour Force's and Enhance Vocational and Technical Training- 2:30 pm Monday May 24, 2010",lee.keesung@gma开发者_如何学运维il.com,.....

How can i grab the whole subject together with the comma without the double quote in the specific group(19th group)


You mention:

Basically, the information in the log is separated by the comma...also if a comma is part of the field the field will be double quoted.

which makes it a CSV file. Parsing a CSV file is a solved problem and you need not reinvent the wheel. Use a CSV parser provided by your language library.

If you are using Perl take a look at the Text::CSV module.


The line you gave seems to be in a CSV format. Why not parse it using a CSV parser, such as:

  • http://opencsv.sourceforge.net/
  • http://supercsv.sourceforge.net/


For java use Apache commons:

http://commons.apache.org/sandbox/csv/

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜