How can I traverse a file system with a generator?
I'm trying to create a utility class for traversing all the files in a directory, including those within subdirectories and sub-subd开发者_StackOverflowirectories. I tried to use a generator because generators are cool; however, I hit a snag.
def grab_files(directory):
for name in os.listdir(directory):
full_path = os.path.join(directory, name)
if os.path.isdir(full_path):
yield grab_files(full_path)
elif os.path.isfile(full_path):
yield full_path
else:
print('Unidentified name %s. It could be a symbolic link' % full_path)
When the generator reaches a directory, it simply yields the memory location of the new generator; it doesn't give me the contents of the directory.
How can I make the generator yield the contents of the directory instead of a new generator?
If there's already a simple library function to recursively list all the files in a directory structure, tell me about it. I don't intend to replicate a library function.
Why reinvent the wheel when you can use os.walk
import os
for root, dirs, files in os.walk(path):
for name in files:
print os.path.join(root, name)
os.walk is a generator that yields the file names in a directory tree by walking the tree either top-down or bottom-up
As of Python 3.4, you can use the glob()
method from the built-in pathlib module:
import pathlib
p = pathlib.Path('.')
list(p.glob('**/*')) # lists all files recursively
I agree with the os.walk solution
For pure pedantic purpose, try iterate over the generator object, instead of returning it directly:
def grab_files(directory):
for name in os.listdir(directory):
full_path = os.path.join(directory, name)
if os.path.isdir(full_path):
for entry in grab_files(full_path):
yield entry
elif os.path.isfile(full_path):
yield full_path
else:
print('Unidentified name %s. It could be a symbolic link' % full_path)
Starting with Python 3.4, you can use the Pathlib module:
In [48]: def alliter(p):
....: yield p
....: for sub in p.iterdir():
....: if sub.is_dir():
....: yield from alliter(sub)
....: else:
....: yield sub
....:
In [49]: g = alliter(pathlib.Path("."))
In [50]: [next(g) for _ in range(10)]
Out[50]:
[PosixPath('.'),
PosixPath('.pypirc'),
PosixPath('.python_history'),
PosixPath('lshw'),
PosixPath('.gstreamer-0.10'),
PosixPath('.gstreamer-0.10/registry.x86_64.bin'),
PosixPath('.gconf'),
PosixPath('.gconf/apps'),
PosixPath('.gconf/apps/gnome-terminal'),
PosixPath('.gconf/apps/gnome-terminal/%gconf.xml')]
This is essential the object-oriented version of sjthebats answer.
Note that the Path.glob **
pattern returns only directories!
os.scandir()
is a "function returns directory entries along with file attribute information, giving better performance [than os.listdir()
] for many common use cases." It's an iterator that does not use os.listdir()
interally.
You can use path.py. Unfortunately the author's website is no longer around, but you can still download the code from PyPI. This library is a wrapper around path functions in the os
module.
path.py
provides a walkfiles()
method which returns a generator iterating recursively over all files in the directory:
>>> from path import path
>>> print path.walkfiles.__doc__
D.walkfiles() -> iterator over files in D, recursively.
The optional argument, pattern, limits the results to files
with names that match the pattern. For example,
mydir.walkfiles('*.tmp') yields only files with the .tmp
extension.
>>> p = path('/tmp')
>>> p.walkfiles()
<generator object walkfiles at 0x8ca75a4>
>>>
addendum to the answer of gerrit. I wanted to make something more flexible.
list all files in pth
matching a given pattern
, can also list dirs if only_file
is False
from pathlib import Path
def walk(pth=Path('.'), pattern='*', only_file=True) :
""" list all files in pth matching a given pattern, can also list dirs if only_file is False """
if pth.match(pattern) and not (only_file and pth.is_dir()) :
yield pth
for sub in pth.iterdir():
if sub.is_dir():
yield from walk(sub, pattern, only_file)
else:
if sub.match(pattern) :
yield sub
精彩评论