开发者

Parse log file in python

I have a log file that has lines that look like this:

"1","2546857-23541","f_last","user","4:19 P.M.","11/02/2009","START","27","27","3","c2546857-23541",""

Each line in the log as 12 double quote sections and the 7th double quote section in the string comes from where the user typed something into the chat window:

"22","2546857-23541",开发者_StackOverflow"f_last","john","4:38 P.M.","11/02/2009","
What's up","245","47","1","c2546857-23541",""

This string also shows the issue I'm having; There are areas in the chat log where the text the user typed is on a new line in the log file instead of the same line like the first example. So basically I want the lines in the second example to look like the first example.

I've tried using Find/Replace in N++ and I am able to find each "orphaned" line but I was unable to make it join the line above it. Then I thought of making a python file to automate it for me, but I'm kind of stuck about how to actually code it.


Python errors out at this line running unutbu's code

"1760","4746880-00129","bwhiteside","tom","11:47 A.M.","12/10/2009","I do not see ^"refresh your knowledge
^" on the screen","422","0","0","c4746871-00128",""


The csv module is smart enough to recognize when a quoted item is not finished (and thus must contain a newline character).

import csv
with open('data.log',"r") as fin:
    with open('data2.log','w') as fout:        
        reader=csv.reader(fin,delimiter=',', quotechar='"', escapechar='^')
        writer=csv.writer(fout, delimiter=',', 
                          doublequote=False, quoting=csv.QUOTE_ALL)
        for row in reader:
            row[6]=row[6].replace('\n',' ')
            writer.writerow(row)


If you data is valid CSV you can use Python's csv.reader class. It should work just fine with your sample data. It may not work correctly depending an what an embeded double-quote looks like from the source system. See: http://docs.python.org/library/csv.html#module-contents.


Unless I'm misunderstanding the problem. You simply need to read in the file and remove any newline characters that occur between double quote characters.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜