Architecture for Image hosting site
Scenario:
* A user uploads an image and enters some information about that image * Information and image get uploaded (to all servers) * User gets confirmation that image is uploadedFactors:
* Dozens of servers, distributed all over the world * Image should end up on disk, since it will be served * Information should end up in a database * Images are small, no bigger than 5mbWe considered various architectural solutions and technologies (git murder, rsync to name a few), but we're still not 100% how to approach this. Current solution is way too slow and we're looking to improve (we push files to all servers from our "upload"开发者_开发技巧 server).
Any thoughts? Thanks in advance
First, let's assume for simplicity that the data is written to a file and both files are zipped up together. So below I'm going to assume there is only one file (the zip file). This is just a detail (and is in fact completely unnecessary for bittorrent!)
Bittorrent (or something that works in a similar way) is basically the fastest way to do this, for large files. As soon as a server has downloaded a piece of the file, it will start trying to upload it to any other servers that need it. You could modify bittorrent to prefer geographically closer IPs in order to minimise inter-LAN bandwidth usage.
If you don't need to use bittorrent, or if the files are small so it wouldn't make sense, just make one server upload to two others, then those two others upload to two others each, etc. Or you could use a fan-out factor of more than 2. Experiment with what works best for you.
Take a look at Riak. It offers very good support for massive distribution and data replication. We've been using it successfully for a while now and it's proven very resilient.
I'd model it on Riak with the image and metadata stored separately, with a link between them. They both end up in a "database" and on disk this way, with an easy way to get form one to the other and accessible via a URL.
Note: for replication over WAN you'll need the enterprise version, which isn't free.
精彩评论