How do I do this in Python (File Manipulation)?
I have a bunch of HTML files in HTML 开发者_JS百科folder. Those HTML files have unicode characters which I solved by using filter(lambda x: x in string.printable, line)
. Now how do I write the changes back to the original file? What is the best way of doing it? Each HTML file is of 30 kb in size.
1 import os, string
2
3 for file in os.listdir("HTML/"):
4 print file
5 myfile = open('HTML/' + file)
6 fileList = myfile.readlines()
9 for line in fileList:
10 #print line
11 line = filter(lambda x: x in string.printable, line)
12 myfile.close()
Use the fileinput module. It allows you to read and write to the same file in place:
import fileinput,sys,os
files=[os.path.join('HTML',filename) for filename in os.listdir("HTML/")]
for line in fileinput.input(files, inplace=True):
line = filter(lambda x: x in string.printable, line)
sys.stdout.write(line)
At first I didn't understand what @~unutbu was getting at, but after reading the documentation for fileinput
module I found this, which I hadn't seen before (emphasis mine):
Optional in-place filtering: if the keyword argument inplace=1 is passed to fileinput.input() or to the FileInput constructor, the file is moved to a backup file and standard output is directed to the input file (if a file of the same name as the backup file already exists, it will be replaced silently). This makes it possible to write a filter that rewrites its input file in place. If the backup parameter is given (typically as backup='.'), it specifies the extension for the backup file, and the backup file remains around; by default, the extension is '.bak' and it is deleted when the output file is closed. In-place filtering is disabled when standard input is read.
So I think his answer is best, and this explains why.
This should work on Linux; support on other operative systems is iffy (see below).
import os, string
for file in os.listdir("HTML/"):
print file
myfile = open('HTML/' + file)
fileList = myfile.readlines()
for pos, line in enumerate(fileList):
line = filter(lambda x: x in string.printable, line) # see note 1
fileList[pos] = line
myfile.close()
myfile = open('HTML/' + file, "wz") # see note 2
myfile.write("\n".join(fileList))
Note 1. Simply assigning to line
does not change fileList
. Variables really are labels (references) onto objects: assigning to a label changes the object the label is attached to. That line creates a list which is then assigned
Note 2. The "wz" file mode empties the file on opening (it should be the equivalent of the O_TRUNC
flag when passed to open()
). It might not be available on platforms other than Linux.
精彩评论