Comparing 2700+ filenames against 29000+ rows in a db table
I have a ftp repository that is currently at 2761 files (PDF files). I have a MySQL table (a list of those files) that's actually at 29k+ files (it hasn't been parsed since a recent upgrade). I'd like to provide admins with a one-click script that will do the following:
- 1) Compare the "existing" filenames with the rows in the database table
- 2) Delete any rows that are not in the existing filesystem
- 3) Add a new row for a file that doesn't appear in the database table
This usually is handled by an AppleScript/FolderAction/Perl script method, but it's not perfect (it chokes sometimes when large numbers of files are added at a time - like on heavy news nights).
It takes about 10-20 seconds to build the file list from 开发者_如何学Gothe FTP repository (using $file_list = ftp_nlist($conn_id,$target_dir) )
, and I'm not sure how to best go about comparing with the DB table (I'm positive that an WHERE NOT IN (big_fat_list)
would be a nightmare query to run).
Any suggestions?
Load the list of filenames into another table, then perform a couple of queries that fulfill your requirements.
Yup that is the solution. I propose you to use pdo prepared insert statement to reduce time. or do what mysqldump does, generate insert into table(column1,column2, ... ) values(), (), (), ... ; insert into ...
you would have to check the maximun values list in mysql site.
I usually dump the recursive directory list with dates and file sizes to a temporary table. Then I remove items not found:
delete
from A
where not exists (
select null as nothing
from temp b
where a.key = b.key )
Then I update items already there (for file sizes, CRCs):
update a set nonkeyfield1 = b.nonkeyfield1, nonkeyfield2 = b.nonkeyfield2
from a join temp b on a.key = b.key
Then I add items found:
insert into A ( field, list)
select field, list
from temp b
where not exists (
select null as nothing
from A
where b.key = a.key )
This is from memory, so test it first before you fly. The select null as nothing keeps you from wasting RAM while you check things.
精彩评论