200,000 images in single folder in linux, performance issue or not?
I have a php/mysql website with over 200,000 images in single folder (linux server). I don't think, that I will never need to see them in file explorer, instead they will be viewed on website on their individual pages. They are just displayed in product page on website. File system is ext3. so is it wise to save them 开发者_如何学Pythonin single folder? can it slow down the site's performance?
Ext3 uses tree to hold directory contents, so its capability to handle a large number of files in a single directory is better than that of those file systems with linear directory listings. Here you can read the description of the tree used to keep directory contents.
However, 200K files is still a huge number. It's reasonable to move them into subdirectories based on first n characters of file names. This approach lets you keep only file names and not directory names, and when you need to access the file, you know where (in which subdirectory) to look for it.
This seems to have been answered at the below link.
https://serverfault.com/questions/43133/filesystem-large-number-of-files-in-a-single-directory
I know an answer was chosen, I want to add a solution on improving the performance, for interest
Querying the directory listing each time will cost the most overhead, if the directory listing returns all results every time.
You can improve performance by storing the listing in an indexed database (say SQLite) and just query the results from there. You can select a subset of records and implement pagination much easier this way, and filter the results too.
file systems determine performance, and 200,000 images without indexing will slow down performance in ext2 (or NTFS)
It's quite probable that some time in the future you might want to do something where having all the images dumped in a single folder will hurt you, or something unexpected will happen and you will regret doing it that way.
On the other hand, having the files split into several folders doesn't seem to have many disadvantages, besides added complexity in dealing with them.
Performance will vary depending on your filesystem, its configuration and your access patterns. I believe it would be quite strange for performance to be perceptibly worse if splitting the files between multiple folders.
So I'd say, split into different folders...
This paper over an ext2 variant for web scenarios might interest you: hashFS: Applying Hashing to Optimize File Systems for Small File Reads.
We have seen a better ext2 performance with a flat file set (more files in a directory) than a deep file set (deeper directory tree) for a web scenario (assumptions stated in paper).
Granted, in retrospect the evaluation should have been more extensive. But it might be worth reading.
精彩评论