Help removing items from a text file using python
After implementing some of the solutions in my previous question, I've come up with the following solution:
reader = open('C://text.txt')
writer = open('C://nona.txt', 'w')
counter = 1
names, nums = [], []
row = reader.read().split(' ')
x = len(row)/2
for (a, b) in [(c, d) for c, d in zip(row[:x], row[x:]) if d!='na']:
print counter
开发者_JS百科 counter +=1
names.append(a)
nums.append(b)
writer.write(' '.join(names))
writer.write(' ')
writer.write(' '.join(nums))
This program works quite well for a smaller sample data set. However it freezes up when I use the full data set and causes python to crash. Any suggestions on how I can overcome this?
What you should do is break your file up into two separate files. Your logic should do something like this:
- Open data file
- open name file
- read next data
- is it name? see 5. Otherwise see 6
- write name to name file, see 3
- is it number or na? close name file and open number file
- read next data
- is it number or na? see 7, otherwise write file
once you have your files split into two pieces, you can iterate over them together:
names = open('names.txt')
numbers = open('numbers.txt')
for name, number in zip(names, numbers):
if not numbers == 'na':
output.write(name + " " + number)
or you could write to two different files and then join them together if that's what you need.
Your file is organized in an unfortunate manner for Pythonic processing.
Note that when you call reader.read()
, you are reading the entire file into memory. Let's say this takes up X bytes.
Calling split
will effectively add another X bytes of memory usage, as it will create a new string for each separate string in the file.
Then you call row[:x]
and row[x:]
, which will add ANOTHER X bytes (because the slice operator makes a copy).
Then you call zip, and make a list comprehension, etc, etc. Strings and tuples are immutable data, which means you are always creating them from scratch.
I would approach this problem at a lower level. Open one file descriptor and point it to the beginning of the file. Open another and have it seek to the beginning of the (na/0/1/2) values (you will know where this is by counting the spaces). Now, read one name and one value at a time, and if the value is not "na" you can write the name to an output file. If you need to write the values to the output file also, hold them in memory and write them all at once when you are done.
Unfortunately this will be more difficult to code than just using the high-level functions that Python provides (you will need to write code that operates at the character level), but as you have seen there is a price to pay for those high-level functions.
精彩评论