python newbie question: converting code to classes
i have this code:
import csv
import collections
def do_work():
(data,counter)=get_file('thefile.csv')
b=samples_subset1(data, counter,'/pythonwork/samples_subset3.csv',500)
return
def get_fil开发者_运维知识库e(start_file):
with open(start_file, 'rb') as f:
data = list(csv.reader(f))
counter = collections.defaultdict(int)
for row in data:
counter[row[10]] += 1
return (data,counter)
def samples_subset1(data,counter,output_file,sample_cutoff):
with open(output_file, 'wb') as outfile:
writer = csv.writer(outfile)
b_counter=0
b=[]
for row in data:
if counter[row[10]] >= sample_cutoff:
b.append(row)
writer.writerow(row)
b_counter+=1
return (b)
i recently started learning python, and would like to start off with good habits. therefore, i was wondering if you can help me get started to turn this code into classes. i dont know where to start.
Per my comment on the original post, I don't think a class is necessary here. Still, if other Python programmers will ever read this, I'd suggest getting it inline with PEP8, the Python style guide. Here's a quick rewrite:
import csv
import collections
def do_work():
data, counter = get_file('thefile.csv')
b = samples_subset1(data, counter, '/pythonwork/samples_subset3.csv', 500)
def get_file(start_file):
with open(start_file, 'rb') as f:
counter = collections.defaultdict(int)
data = list(csv.reader(f))
for row in data:
counter[row[10]] += 1
return (data, counter)
def samples_subset1(data, counter, output_file, sample_cutoff):
with open(output_file, 'wb') as outfile:
writer = csv.writer(outfile)
b = []
for row in data:
if counter[row[10]] >= sample_cutoff:
b.append(row)
writer.writerow(row)
return b
Notes:
- No one uses more than 4 spaces to indent ever. Use 2 - 4. And all your levels of indentation should match.
- Use a single space after the commas between arguments to functions ("F(a, b, c)" not "F(a,b,c)")
- Naked return statements at the end of a function
are meaningless. Functions without
return statements implicitly return
None
- Single space around all operators (a = 1, not a=1)
- Do not wrap single values in parentheses. It looks like a tuple, but it isn't.
- b_counter wasn't used at all, so I removed it.
- csv.reader returns an iterator, which you are casting to a list. That's usually a bad idea because it forces Python to load the entire file into memory at once, whereas the iterator will just return each line as needed. Understanding iterators is absolutely essential to writing efficient Python code. I've left
data
in for now, but you could rewrite to use an iterator everywhere you're usingdata
, which is a list.
Well, I'm not sure what you want to turn into a class. Do you know what a class is? You want to make a class to represent some type of thing. If I understand your code correctly, you want to filter a CSV to show only those rows whose row[ 10 ]
is shared by at least sample_cutoff
other rows. Surely you could do that with an Excel filter much more easily than by reading through the file in Python?
What the guy in the other thread suggested is true, but not really applicable to your situation. You used a lot of global variables unnecessarily: if they'd been necessary to the code you should have put everything into a class and made them attributes, but as you didn't need them in the first place, there's no point in making a class.
Some tips on your code:
Don't cast the file to a list. That makes Python read the whole thing into memory at once, which is bad if you have a big file. Instead, simply iterate through the file itself:
for row in csv.reader(f):
Then, when you want to go through the file a second time, just dof.seek(0)
to return to the top and start again.Don't put
return
at the end of every function; that's just unnecessary. You don't need parentheses, either:return spam
is fine.
Rewrite
import csv
import collections
def do_work():
with open( 'thefile.csv' ) as f:
# Open the file and count the rows.
data, counter = get_file(f)
# Go back to the start of the file.
f.seek(0)
# Filter to only common rows.
b = samples_subset1(data, counter,
'/pythonwork/samples_subset3.csv', 500)
return b
def get_file(f):
counter = collections.defaultdict(int)
data = csv.reader(f)
for row in data:
counter[row[10]] += 1
return data, counter
def samples_subset1(data, counter, output_file, sample_cutoff):
with open(output_file, 'wb') as outfile:
writer = csv.writer(outfile)
b = []
for row in data:
if counter[row[10]] >= sample_cutoff:
b.append(row)
writer.writerow(row)
return b
精彩评论