开发者

Organizing thousands of images on a server

I'm developing a website which might grow up to a few thousand users, all of which would upload up to ten pictures on the server. I'm wondering what would be the best way of storing pictures. Lets assume that I have, 5000 users with 10 pictures each, which gives us 50 000 pics. (I guess it wouldn't be a good idea to store them in the database in blobs ;) )

Would it be a good way to dynamically create directories for every 100 users registered, (50 dirs in total, assuming 5000 users), and upload their pictures there? Would naming convention 'xxx_yy.jpg' (xxx being user id and yy picture nu开发者_开发技巧mber) be ok? In this case, however, there would be 1000 (100x10) pictures in one folder, isn't it too many?


I would most likely store the images by a hash of their contents. A 128-bit SHA, for instance. So, I'd rename a user's uploaded image 'foo.jpg' to be its 128-bit sha (probably in base 64, for uniform 16-character names) and then store the user's name for the file and its SHA in a database. I'd probably also add a reference count. Then if some folks all upload the same image, it only gets stored once and you can delete it when all references vanish.

As for actual physical storage, now that you have a guaranteed uniform naming scheme, you can use your file system as a balanced tree. You can either decide how many files maximum you want in a directory, and have a balancer move files to maintain this, or you can imagine what a fully populated tree would look like, and store your files that way.

The only real drawback to this scheme is that it decouples file names from contents so a database loss can mean not knowing what any file is called, but you should be careful to back up that kind of information anyway.


Different filesystems perform differently with directories holding large numbers of files. Some slow down tremendously. Some don't mind at all. For example, IBM JFS2 stores the contents of directory inodes as a B+ Tree sorted by filename.... so it probably provides log(n) access time even in the case of very large directories.

getting ls or dir to read, sort, get size/date info, and print them to stdout is a completely different task from accessing the file contents given the filename.... So don't let the inability of ls to list a huge directory guide you.

Whatever you do, don't optimize too early. Just make sure your file access mechanism can be asbstracted (make a FileStorage that you .getfile(id) from, or something...).

That way you can put in whatever directory structure you like, or for example if you find it's better to store these items as a BLOB column in a database, you have that option...


granted i have never stored 50,000 images, but i usually just store all images in the same directory and name them as such to avoid conflict. then store the reference in the db.

$ext = explode( '.', $filename );
$newName = md5( microtime() ) . '.' . $ext;

that way you never have the same two filenames as microtime will never be the same.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜