开发者

Traversing directories to count number of files with a specific string

I have a directory with several levels of sub-directories. All the files in the directories are html files (approx. 500 in total), and I'd like to go through each file to see if if contains a "sub_middle_1col" division. I found a great tutorial at palewire.com and have used that as my base. The two difficulties I am having are 1) the code broke when it hit a sub-directory (thinking it was a file), and 2) it would n开发者_JAVA百科ot traverse sub-directories -- that is, it only looks at files not in any sub-directory. I may have solved the first problem by adding in a line (noted below), but can't figure out how to integrate other solutions I've seen (e.g., os.walk) into the code in order to solve the second problem. Any ideas? Thanks in advance for any advice.

import os

path = "./Industries"
my_library = os.listdir(path)
out = open("out.txt", "w")

for page in my_library:
    file = os.path.join(path, page)
    if os.path.isfile(file) and file.endswith('.html'):    #I ADDED THIS LINE
        text = open(file, "r")
        hit_count = 0
        for line in text:
            if 'sub_middle_1col' in line:
                hit_count = hit_count + 1
                print >>  out, page + " => " + str(hit_count)  
        print page + " => " + str(hit_count)
        text.close()


Well, you can try:

 import os

 for root,dirs,files in os.walk(path):
     for fname in files:
         if fname.endswith('.html'):
             fq = os.path.join(root, fname)
             for line in open(fq):
                 if 'sub_middle_1col' in line:
                     ...

find() or reg. expressions (re module) to check 'sub_middle_1col' string can give you better performance...

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜