开发者

Help removing items from a text file using python

After implementing some of the solutions in my previous question, I've come up with the following solution:

reader = open('C://text.txt') 
writer = open('C://nona.txt', 'w')
counter = 1    
names, nums = [], []    
row = reader.read().split(' ')
x = len(row)/2
for (a, b) in [(c, d) for c, d in zip(row[:x], row[x:]) if d!='na']:
    print counter
  开发者_JS百科  counter +=1
    names.append(a)
    nums.append(b)

writer.write(' '.join(names))
writer.write(' ')
writer.write(' '.join(nums))

This program works quite well for a smaller sample data set. However it freezes up when I use the full data set and causes python to crash. Any suggestions on how I can overcome this?


What you should do is break your file up into two separate files. Your logic should do something like this:

  1. Open data file
  2. open name file
  3. read next data
  4. is it name? see 5. Otherwise see 6
  5. write name to name file, see 3
  6. is it number or na? close name file and open number file
  7. read next data
  8. is it number or na? see 7, otherwise write file

once you have your files split into two pieces, you can iterate over them together:

names = open('names.txt')
numbers = open('numbers.txt')

for name, number in zip(names, numbers):
   if not numbers == 'na':
       output.write(name + " " + number)

or you could write to two different files and then join them together if that's what you need.


Your file is organized in an unfortunate manner for Pythonic processing.

Note that when you call reader.read(), you are reading the entire file into memory. Let's say this takes up X bytes.

Calling split will effectively add another X bytes of memory usage, as it will create a new string for each separate string in the file.

Then you call row[:x] and row[x:], which will add ANOTHER X bytes (because the slice operator makes a copy).

Then you call zip, and make a list comprehension, etc, etc. Strings and tuples are immutable data, which means you are always creating them from scratch.

I would approach this problem at a lower level. Open one file descriptor and point it to the beginning of the file. Open another and have it seek to the beginning of the (na/0/1/2) values (you will know where this is by counting the spaces). Now, read one name and one value at a time, and if the value is not "na" you can write the name to an output file. If you need to write the values to the output file also, hold them in memory and write them all at once when you are done.

Unfortunately this will be more difficult to code than just using the high-level functions that Python provides (you will need to write code that operates at the character level), but as you have seen there is a price to pay for those high-level functions.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜