开发者

Randomly selecting a file from a tree of directories in a completely fair manner

I'm looking for a way to randomly select a file from a tree of directories in a manner such that any individual file has exactly the same probability of being chosen as all other files. For example in the following tree of files, each file should have a 25% chance of being chosen:

  • /some/parent/dir/
    • Foo.jpg
    • sub_dir/
      • Bar.jpg
      • Baz.jpg
      • another_sub/
        • qux.png

My interim solution which I'm using while I code the rest of the app is to have a function like so:

def random_file(dir):
    file = os.path.join(dir, random.choice(os.listdir(dir开发者_运维问答)));
    if os.path.isdir(file):
        return random_file(file)
    else:
        return file

However this obviously biases the results depending on where they are in the tree and how many siblings are along side them in their directory so they end up with the following probabilities of being selected:

  • /some/parent/dir/
    • Foo.jpg - 50%
    • sub_dir/ (50%)
      • Bar.jpg - 16.6%
      • Baz.jpg - 16.6%
      • another_sub/ (16.6%)
        • qux.png - 16.6%

The context for the function is in a background rotation app I'm writing, so the ability to filter out unwanted file extensions from being in the results would be a bonus (although I could simply force that by choosing again if it's not the file type I want... that gets messy if there's an abundance of files of the "wrong" type, though).


You can only select all files with the same probability if you know the total number of files in advance, so you need to create a full list first:

files = [os.path.join(path, filename)
         for path, dirs, files in os.walk(dir)
         for filename in files
         if not filename.endswith(".bak")]
return random.choice(files)


As other answers mentioned, you can choose one at random either by collecting all file paths into a list and using random.choice. Alternatively, an online selection is possible that uses no extra memory by using more random numbers. For n items, it is either equal selection among the first n-1 items, or the nth item with probability 1/n. This can be computed as you run through the list of possibilities.

You can iterate through all file names with:

def recursive_files(dir):
    for path, _, fnames in os.walk(dir):
        for fname in fnames:
            yield os.path.join(path, fname)

And make the online selection with this:

import random
def online_choice(iterable):
    for n, x in enumerate(iterable, 1):
        if random.randrange(n) == 0:
            pick = x
    return pick


Why don't you just store all the files in one list (with their path) and choose from that?

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜