python style question around reading small files

2022-12-19 09:21 问答作者：

What is the most pythonic way to read in a named file, strip lines that are either empty, contain only spaces, or have # as a first character, and then process remaining lines? Assume it all fits easily in memory.

Note: it's not tough to do this -- what I'm asking is f开发者_如何学JAVAor the most pythonic way. I've been writing a lot of Ruby and Java and have lost my feel.

Here's a strawman:

file_lines = [line.strip() for line in open(config_file, 'r').readlines() if len(line.strip()) > 0]
for line in file_lines:
  if line[0] == '#':
    continue
  # Do whatever with line here.

I'm interested in concision, but not at the cost of becoming hard to read.

Generators are perfect for tasks like this. They are readable, maintain perfect separation of concerns, and efficient in memory-use and time.

def RemoveComments(lines):
    for line in lines:
        if not line.strip().startswith('#'):
            yield line

def RemoveBlankLines(lines):
    for line in lines:
        if line.strip():
            yield line

Now applying these to your file:

filehandle = open('myfile', 'r')
for line in RemoveComments(RemoveBlankLines(filehandle)):
    Process(line)

In this case, it's pretty clear that the two generators can be merged into a single one, but I left them separate to demonstrate their composability.

lines = [r for r in open(thefile) if not r.isspace() and r[0] != '#']

The .isspace() method of strings is by far the best way to test if a string is entirely whitespace -- no need for contortions such as len(r.strip()) == 0 (ech;-).

for line in open("file"):
    sline=line.strip()
    if sline and not sline[0]=="#" :
       print line.strip()

output

$ cat file
one
#
  #

two

three
$ ./python.py
one
two
three

I would use this:

processed = [process(line.strip())
             for line in open(config_file, 'r')
             if line.strip() and not line.strip().startswith('#')]

The only ugliness I see here is all the repeated stripping. Getting rid of it complicates the function a bit:

processed = [process(line)
             for line in (line.strip() for line in open(config_file, 'r'))
             if line and not line.startswith('#')]

This matches the description, ie

strip lines that are either empty, contain only spaces, or have # as a first character, and then process remaining lines

So lines that start or end in spaces are passed through unfettered.

with open("config_file","r") as fp:
    data = (line for line in fp if line.strip() and not line.startswith("#"))
    for item in data:
        print repr(item)

I like Paul Hankin's thinking, but I'd do it differently:

from itertools import ifilter, ifilterfalse, imap

with open(r'c:\temp\testfile.txt', 'rb') as f:
    s1 = ifilterfalse(str.isspace, f)
    s2 = ifilter(lambda x: not x.startswith('#'), s1)
    s3 = imap(str.rstrip, s2)
    print "\n".join(s3)

I'd probably only do it this way instead of using some of the more obvious approaches suggested here if I were concerned about memory usage. And I might define an iscomment function to eliminate the lambda.

The file is small, so performance is not really an issue. I will go for clarity than conciseness:

fp = open('file.txt')
for line in fp:
    line = line.strip()
    if line and not line.startswith('#'):
        # process
fp.close()

If you want, you can wrap this in a function.

Using slightly newer idioms (or with Python 2.5 from __future__ import with) you could do this, which has the advantage of cleaning up safely yet is quite concise.

with file('file.txt') as fp:
    for line in fp:
        line = line.strip()
        if not line or line[0] == '#':
            continue

        # rest of processing here

Note that stripping the line first means the check for "#" will actually reject lines with that as the first non-blank, not merely "as first character". Easy enough to modify if you're strict about that.

继续阅读：coding-style idioms python

python style question around reading small files

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？