开发者

Speed up the creation of pathname

I've 2 folders. The first (called A) contains same images named in the form: subject_incrementalNumber.jpg (where incrementalNumber goes from 0 to X).

Then I process each image contained in folder A and extract some pieces from it, then save each piece in folder B with the name: subject(the same of the original image contained in folder A)_incrementalNumber(the same of folder A)_anotherIncrementalNumber(that distinguish one piece from another).

Finally, I delete the processed image from folder A.

A
    subjectA_0.jpg
    subjectA_1.jpg
    subjectA_2.jpg
    ...
    subjectB_0.jpg

B
    subjectA_0_0.jpg
    subjectA_0_1.jpg

    subjectA_1_0.jpg

    subjectA_2_0.jpg
    ...

Everytime I download a new image of one subject and save it in folder A, I have to calculate a new pathname for this image (I have to found the min incrementalNumber available for the specific subject). The problem is that when I process an image I delete it from folder A and I store only the pieces in folder B, so I have to find the min number available in both folders.

Now I use the following function to create the pathname

output_name = chooseName( subject, folderA, folderB )

# Create incremental file
# If the name already exist, try with incremental number (0, 1, etc.)
def chooseName( owner, dest_files, faces_files ):
    # found the min number available in both folders
    v1 = seekVersion_downloaded( owner, dest_files )
    v2 = seekVersion_faces( owner, faces_files )

    # select the max from those 2
    version = max( v1, v2 )

    # create name
    base = dest_files + os.sep + owner + "_"
    fname = base + str(version) + ".jpg"
    return fname


# Seek the min number availa开发者_JS百科ble in folderA
def seekVersion_folderA( owner, dest_files ):
    def f(x): 
        if fnmatch.fnmatch(x, owner + '_*.jpg'): return x

    res = filter( f, dest_files )

    def g(x): return int(x[x.find("_")+1:-len(".jpg")])
    numbers = map( g, res )

    if len( numbers ) == 0: return 0
    else: return int(max(numbers))+1


# Seek the min number available in folderB
def seekVersion_folderB( owner, faces_files ):
    def f(x): 
        if fnmatch.fnmatch(x, owner + '_*_*.jpg'): return x

    res = filter( f, faces_files )

    def g(x): return int(x[x.find("_")+1:x.rfind("_")])
    numbers = map( g, res )

    if len( numbers ) == 0: return 0
    else: return int(max(numbers))+1

It works, but this process take about 10seconds for each image, and since I have a lot of images this is too inefficient. There is any workaround to make it faster?


As specified, this is indeed a hard problem with no magic shortcuts. In order to find the minimum available number you need to use trial and error, exactly as you are doing. Whilst the implementation could be speeded up, there is a fundamental limitation in the algorithm.

I think I would relax the constraints to the problem a little. I would be prepared to choose numbers that weren't the minimum available. I would store a hidden file in the directory which contained the last number used when creating a file. Every time you come to create another one, read this number from the file, increment it by 1, and see if that name is available. If so you are good to go, if not, start counting up from there. Remember to update the file when you do settle on a name.

If no humans are reading these names, then you may be better off using randomly generated names.


I've found another solution: use the hash of the file as unique file name

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜