Speed up the creation of pathname
I've 2 folders. The first (called A) contains same images named in the form: subject_incrementalNumber.jpg (where incrementalNumber goes from 0 to X).
Then I process each image contained in folder A and extract some pieces from it, then save each piece in folder B with the name: subject(the same of the original image contained in folder A)_incrementalNumber(the same of folder A)_anotherIncrementalNumber(that distinguish one piece from another).
Finally, I delete the processed image from folder A.
A
subjectA_0.jpg
subjectA_1.jpg
subjectA_2.jpg
...
subjectB_0.jpg
B
subjectA_0_0.jpg
subjectA_0_1.jpg
subjectA_1_0.jpg
subjectA_2_0.jpg
...
Everytime I download a new image of one subject and save it in folder A, I have to calculate a new pathname for this image (I have to found the min incrementalNumber available for the specific subject). The problem is that when I process an image I delete it from folder A and I store only the pieces in folder B, so I have to find the min number available in both folders.
Now I use the following function to create the pathname
output_name = chooseName( subject, folderA, folderB )
# Create incremental file
# If the name already exist, try with incremental number (0, 1, etc.)
def chooseName( owner, dest_files, faces_files ):
# found the min number available in both folders
v1 = seekVersion_downloaded( owner, dest_files )
v2 = seekVersion_faces( owner, faces_files )
# select the max from those 2
version = max( v1, v2 )
# create name
base = dest_files + os.sep + owner + "_"
fname = base + str(version) + ".jpg"
return fname
# Seek the min number availa开发者_JS百科ble in folderA
def seekVersion_folderA( owner, dest_files ):
def f(x):
if fnmatch.fnmatch(x, owner + '_*.jpg'): return x
res = filter( f, dest_files )
def g(x): return int(x[x.find("_")+1:-len(".jpg")])
numbers = map( g, res )
if len( numbers ) == 0: return 0
else: return int(max(numbers))+1
# Seek the min number available in folderB
def seekVersion_folderB( owner, faces_files ):
def f(x):
if fnmatch.fnmatch(x, owner + '_*_*.jpg'): return x
res = filter( f, faces_files )
def g(x): return int(x[x.find("_")+1:x.rfind("_")])
numbers = map( g, res )
if len( numbers ) == 0: return 0
else: return int(max(numbers))+1
It works, but this process take about 10seconds for each image, and since I have a lot of images this is too inefficient. There is any workaround to make it faster?
As specified, this is indeed a hard problem with no magic shortcuts. In order to find the minimum available number you need to use trial and error, exactly as you are doing. Whilst the implementation could be speeded up, there is a fundamental limitation in the algorithm.
I think I would relax the constraints to the problem a little. I would be prepared to choose numbers that weren't the minimum available. I would store a hidden file in the directory which contained the last number used when creating a file. Every time you come to create another one, read this number from the file, increment it by 1, and see if that name is available. If so you are good to go, if not, start counting up from there. Remember to update the file when you do settle on a name.
If no humans are reading these names, then you may be better off using randomly generated names.
I've found another solution: use the hash of the file as unique file name
精彩评论