开发者

How to remove trailing whitespace in code, using another script?

Something like:

import fileinput

for lines in fileinput.FileInput("test.txt", inplace=1):
    lines = lines.strip()
    if lines == '': continue
    print lines

But nothing is being printed on stdout.

Assuming some string named foo:

foo.lstrip() # to remove leading white space
foo.rstrip() # to remove trailing whitespace开发者_C百科
foo.strip()  # to remove both lead and trailing whitespace


fileinput seems to be for multiple input streams. This is what I would do:

with open("test.txt") as file:
    for line in file:
        line = line.rstrip()
        if line:
            print(line)


You don't see any output from the print statements because FileInput redirects stdout to the input file when the keyword argument inplace=1 is given. This causes the input file to effectively be rewritten and if you look at it afterwards the lines in it will indeed have no trailing or leading whitespace in them (except for the newline at the end of each which the print statement adds back).

If you only want to remove trailing whitespace, you should use rstrip() instead of strip(). Also note that the if lines == '': continue is causing blank lines to be completely removed (regardless of whether strip or rstrip gets used).

Unless your intent is to rewrite the input file, you should probably just use for line in open(filename):. Otherwise you can see what's being written to the file by simultaneously echoing the output to sys.stderr using something like the following (which will work in both Python 2 and 3):

from __future__ import print_function
import fileinput
import sys

for line in (line.rstrip() for line in
                fileinput.FileInput("test.txt", inplace=1)):
    if line:
        print(line)
        print(line, file=sys.stderr)


If you're looking to tidy up for PEP8, this will trim trailing whitespace for your whole project:

import os

PATH = '/path/to/your/project'

for path, dirs, files in os.walk(PATH):
    for f in files:
        file_name, file_extension = os.path.splitext(f)
        if file_extension == '.py':
            path_name = os.path.join(path, f)
            with open(path_name, 'r') as fh:
                new = [line.rstrip() for line in fh]
            with open(path_name, 'w') as fh:
                [fh.write('%s\n' % line) for line in new]


This is the sort of thing that sed is really good at: $ sed 's/[ \t]*$//'. Be aware the you will probably need to literally type a TAB character instead of \t for this to work.


It seems, fileinput.FileInput is a generator. As such, you can only iterate over it once, then all items have been consumed and calling it's next method raises StopIteration. If you want to iterate over the lines more than once, you can put them in a list:

list(fileinput.FileInput('test.txt'))

Then call rstrip on them.


Save as fix_whitespace.py:

#!/usr/bin/env python
"""
Fix trailing whitespace and line endings (to Unix) in a file.
Usage: python fix_whitespace.py foo.py
"""

import os
import sys


def main():
    """ Parse arguments, then fix whitespace in the given file """
    if len(sys.argv) == 2:
        fname = sys.argv[1]
        if not os.path.exists(fname):
            print("Python file not found: %s" % sys.argv[1])
            sys.exit(1)
    else:
        print("Invalid arguments. Usage: python fix_whitespace.py foo.py")
        sys.exit(1)
    fix_whitespace(fname)


def fix_whitespace(fname):
    """ Fix whitespace in a file """
    with open(fname, "rb") as fo:
        original_contents = fo.read()
    # "rU" Universal line endings to Unix
    with open(fname, "rU") as fo:
        contents = fo.read()
    lines = contents.split("\n")
    fixed = 0
    for k, line in enumerate(lines):
        new_line = line.rstrip()
        if len(line) != len(new_line):
            lines[k] = new_line
            fixed += 1
    with open(fname, "wb") as fo:
        fo.write("\n".join(lines))
    if fixed or contents != original_contents:
        print("************* %s" % os.path.basename(fname))
    if fixed:
        slines = "lines" if fixed > 1 else "line"
        print("Fixed trailing whitespace on %d %s" \
              % (fixed, slines))
    if contents != original_contents:
        print("Fixed line endings to Unix (\\n)")


if __name__ == "__main__":
    main()


It's a bit surprising seeing multiple answers suggesting to use python for this task, as there's no need to write a multi-line program for this.

Standard Unix tools like sed, awk or perl can achieve this easily straight from the command-line.

e.g anywhere you have perl (Windows, Mac, Linux) the following should achieve what the OP asked:

perl -i -pe 's/[ \t]+$//;' files...

Explanation of the arguments to perl:

-i   # run the edit "in place" (modify the original file)
-p   # implies a loop with a final print over every input line
-e   # next arg is the perl expression to apply (to every line)

s/[ \t]$// is a substitution regex s/FROM/TO/: replace every trailing (end of line) non-empty space (spaces or tabs) with nothing.

Advantages:

  • One liner, no programming needed
  • Works on multiple (any number) of files
  • Works correctly on standard-input (no file arguments given)

Edit:

Newer versions of perl support \h (any horizontal-space character), so the solution becomes even shorter:

perl -i -pe 's/\h+$//;' files...

More generally, if you want to modify any number of files directly from the command line, replacing every appearance of FOO with BAR, you may always use this generic template:

perl -i -pe 's/FOO/BAR/' files...
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜