How do I alphabetize a file in Python?
I am trying to get a list of presidents alphabetized by last name, even though the file that it is being drawn is currently listed first name, last name, date in office, and date out of office.
Here is what I have, any help 开发者_如何学JAVAon what I need to do with this. I have searched around for some answers, and most of them are beyond my level of understanding. I feel like I am missing something small. I tried to break them all out into a list, and then sort them, but I could not get it to work, so this is where I started from.
INPUT_FILE = 'presidents.txt'
OUTPUT_FILE = 'president_NEW.txt'
OUTPUT_FILE2 = 'president_NEW2.txt'
def main():
infile = open(INPUT_FILE)
outfile = open(OUTPUT_FILE, 'w')
outfile2 = open(OUTPUT_FILE2,'w')
stuff = infile.readline()
while stuff:
stuff = stuff.rstrip()
data = stuff.split('\t')
president_First = data[1]
president_Last = data[0]
start_date = data[2]
end_date = data[3]
sentence = '%s %s was president from %s to %s' % \
(president_First,president_Last,start_date,end_date)
sentence2 = '%s %s was president from %s to %s' % \
(president_Last,president_First,start_date, end_date)
outfile2.write(sentence2+ '\n')
outfile.write(sentence + '\n')
stuff = infile.readline()
infile.close()
outfile.close()
main()
What you should do is put the presidents in a list, sort that list, and then print out the resulting list.
Before your for loop add:
presidents = []
Have this code inside the for loop after you pull out the names/dates
president = (last_name, first_name, start_date, end_date)
presidents.append(president)
After the for loop
presidents.sort() # because we put last_name first above
# it will sort by last_name
Then print it out:
for president in presidents
last_name, first_name, start_date, end_date = president
string1 = "..."
It sounds like you tried to break them out into a list. If you had trouble with that, show us the code that resulting from that attempt. It was right way to approach the problem.
Other comments:
Just a couple of points where you code could be simpler. Feel free to ignore or use this as you want:
president_First=data[1]
president_Last= data[0]
start_date=data[2]
end_date=data[3]
can be written as:
president_Last, president_First, start_date, end_date = data
stuff=infile.readline()
And
while stuff:
stuff=stuff.rstrip()
data=stuff.split('\t')
...
stuff = infile.readline()
can be written as:
for stuff in infile:
...
#!/usr/bin/env python
# this sounds like a homework problem, but ...
from __future__ import with_statement # not necessary on newer versions
def main():
# input
with open('presidents.txt', 'r') as fi:
# read and parse
presidents = [[x.strip() for x in line.split(',')] for line in fi]
# sort
presidents = sorted(presidents, cmp=lambda x, y: cmp(x[1], y[1]))
# output
with open('presidents_out.txt', 'w') as fo:
for pres in presidents:
print >> fo, "president %s %s was president %s %s" % tuple(pres)
if __name__ == '__main__':
main()
I tried to break them all out into a list, and then sort them
What do you mean by "them"?
Breaking up the line into a list of items is a good start: that means you treat the data as a set of values (one of which is the last name) rather than just a string. However, just sorting that list is no use; Python will take the 4 strings from the line (the first name, last name etc.) and put them in order.
What you want to do is have a list of those lists, and sort it by last name.
Python's lists provide a sort
method that sorts them. When you apply it to the list of president-info-lists, it will sort those. But the default sorting for lists will compare them item-wise (first item first, then second item if the first items were equal, etc.). You want to compare by last name, which is the second element in your sublists. (That is, element 1; remember, we start counting list elements from 0.)
Fortunately, it is easy to give Python more specific instructions for sorting. We can pass the sort function a key
argument, which is a function that "translates" the items into the value we want to sort them by. Yes, in Python everything is an object - including functions - so there is no problem passing a function as a parameter. So, we want to sort "by last name", so we would pass a function that accepts a president-info-list and returns the last name (i.e., element [1]
).
Fortunately, this is Python, and "batteries are included"; we don't even have to write that function ourself. We are given a magical tool that creates functions that return the nth element of a sequence (which is what we want here). It's called itemgetter
(because it makes a function that gets the nth item of a sequence - "item" is more usual Python terminology; "element" is a more general CS term), and it lives in the operator
module.
By the way, there are also much neater ways to handle the file opening/closing, and we don't need to write an explicit loop to handle reading the file - we can iterate directly over the file (for line in file:
gives us the lines of the file in turn, one each time through the loop), and that means we can just use a list comprehension
(look them up).
import operator
def main():
# We'll set up 'infile' to refer to the opened input file, making sure it is automatically
# closed once we're done with it. We do that with a 'with' block; we're "done with the file"
# at the end of the block.
with open(INPUT_FILE) as infile:
# We want the splitted, rstripped line for each line in the infile, which is spelled:
data = [line.rstrip().split('\t') for line in infile]
# Now we re-arrange that data. We want to sort the data, using an item-getter for
# item 1 (the last name) as the sort-key. That is spelled:
data.sort(key=operator.itemgetter(1))
with open(OUTPUT_FILE) as outfile:
# Let's say we want to write the formatted string for each line in the data.
# Now we're taking action instead of calculating a result, so we don't want
# a list comprehension any more - so we iterate over the items of the sorted data:
for item in data:
# The item already contains all the values we want to interpolate into the string,
# in the right order; so we can pass it directly as our set of values to interpolate:
outfile.write('%s %s was president from %s to %s' % item)
I did get this working with Karls help above, although I did have to edit the code to get it to work for me, due to some errors I was getting. I eliminated those and ended up with this.
import operator
INPUT_FILE = 'presidents.txt'
OUTPUT_FILE2= 'president_NEW2.txt'
def main():
with open(INPUT_FILE) as infile:
data = [line.rstrip().split('\t') for line in infile]
data.sort(key=operator.itemgetter(0))
outfile=open(OUTPUT_FILE2,'w')
for item in data:
last=item[0]
first=item[1]
start=item[2]
end=item[3]
outfile.write('%s %s was president from %s to %s\n' % (last,first,start,end))
main()
精彩评论