开发者

Python: Rearranging csv lines with conditions

If I have few csv files as follows:

a,1,2,3
type, max, min, 开发者_Python百科avg
b,4,5,6
<empty line>
c,6,7,8
xxx,4,3,2
d,5,6,7

after reading the above csv file using csv reader, how do i dump the data to a new xls file using xlwt module but in an arranged manner, so that it: - writes the header as 1st line (this line always has 'type' as 1st item.) - ignores empty lines - ignores any lines which has 1st item as 'xxx'

I tried with the following code, but the empty line doesnt disappear. :(

for filename in glob.glob(p):
     (f_path, f_name) = os.path.split(filename)
     (f_short_name, f_extension) = os.path.splitext(f_name)
     ws = wb.add_sheet(str(f_short_name))
     spamReader = csv.reader(open(filename, 'rb'))

     for row in spamReader:
         pass_count = 0
         if 'type' in row[0]:
             for col in range(len(row)):
                 ws.write(0,col,convert(row[col]))
         else:
             if (((row[0] == 'xxx') or (row[0] == ' ')):
                 pass_count += 1
                 pass
             else:
                 for col in range(len(row)):
                     ws.write(row_count,col,convert(row[col]))

        row_count = row_count+1-pass_count


wb.save(q)

EDIT: Guys, apologies for misleading with the initial csv data. My csv data file doesnt contain any empty lines. It is the end product that contains the empty line, i.e. the final xls file. The empty line occurs exactly at the supposedly row[0] - 'type'. also, i have integrated a code which prevents line1 being overwritten.

e.g. input:

a,1,2,3
type, max, min, avg
b,4,5,6
c,6,7,8
xxx,4,3,2
d,5,6,7

output xls:

type, max, min, avg
<empty line>
a,1,2,3
b,4,5,6
c,6,7,8
d,5,6,7

Here, header is being written to 1st row, line with 'xxx' being ignored but, empty line occurs at row2 which is the location of line with row[0]='type' from the input file. This is reason, why i brought in `pass_count' to skip this line but seems not hitting the spot somewhere.


Your "empty" line is NOT empty.

If it were empty, row would be [] i.e. an empty list, and the first of all those references to row[0] would cause an exception to be raised. As that hasn't happened, and as you say it fails to match one space, the conclusion is that it must contain some other bunch of whitespace.

For robustness in the case of a truly empty line, you should test for empty row first:

if not row: continue
row = [x.strip() for x in row] # remove leading and trailing whitespace from all fields
if not row[0] or row[0] == "xxx": continue
if row[0] == "type":
    # code to write headings goes here
else:
    # code to write data goes here
row_count += 1 # pass_count is pointless

By the way, your example data has "type" in the second input line. This would overwrite the first row in the output file, and the 2nd output file row would be empty!

Update in response to revised information

Even if you don't have an empty line in this current file, it is very good practice to guard against completely empty lines, lines with only whitespace, and fields which consist only of whitespace, using code similar to what I suggested. A blank or empty line at the end of a csv file is not uncommon.

I should have mentioned that you seem to have an antique version of xlwt; later versions will raise an exception e.g. Attempt to overwrite cell: sheetname=u'x' rowx=0 colx=0. This was introduced to trap unintentional overwriting, and can be turned off on a sheet-by-sheet basis: workbook.add_sheet(u'Some Sheet Name', cell_overwrite_ok=True)

Your use of str() in ws = wb.add_sheet(str(f_short_name)) is definitely unnecessary and may cause an exception if the file name is already a unicode object.


If a line were empty, I don't think this would be true:

row[0] == ' '  # A space?

I would expect row to be an empty list, but perhaps there's something in your data that I don't know about.

Also, instead of sorting out the logic to keep track of row_count and then subtracting pass_count, why not keep things simple and just increment row_count whenever you write another Excel row? That would make pass_count unnecessary.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜