Is http POSTing a file across a network much less efficient than copying?
We're developing a windows service which will act as a sort of 'slave' process. This process basically downloads a pdf, splits it into several pdfs then needs to send that pdf back.
We're currently using a http request to retrieve the pdf and a number of POSTs to send the files back. This is so the slave service can be run from pretty much any machine and more slaves can be easily added to lighten the load as necesscary.
My question is: is using http for file transfers like this significantly slower than, for example, jus开发者_运维知识库t using copy commands (which would work only if the slave is on the same machine/network).
Using normal commands is feasible but I like the flexibility in just being able to add a new slave to anywhere.
My completely gut-based thought would be that some protocols, such as NFS, would in some circumstances be somewhat and maybe even significantly faster than HTTP. But I wouldn't think that's enough data to go by. I think you need to just figure out how much of a difference would matter to you, and then run some quick tests. Better yet, make a gut call on which one's a better fit for your needs — HTTP is certainly much easier to get through firewalls and over the open internet, and maybe even over a VPN — and just try that first. If you hit a wall, experiment with other options.
Update: right after I posted this, I remembered that Backblaze, the online backup service, uses HTTPS for all its internal data transfer to and from their storage appliances. This is documented in this post: Petabytes on a budget: How to build cheap cloud storage — jump down to "A Backblaze Storage Pod Runs Free Software". There's some good thinking there on the advantages of HTTPS over a lower-level protocol. And they have to transfer a lot of data, quickly. So if it works for them, there's a good chance it'll work for you.
The overhead of HTTP isn't very high unless your server is configured in a profoundly unusual way (or it's handling so much traffic that these requests are getting queued behind other HTTP requests from the outside world).
If you're talking about a machine (or, I should say, a process) setup specifically for this purpose that just happens to use HTTP as its transfer protocol, I doubt you'll see any noticeable delays in transferring the data.
rsync or a similar binary protocol will be faster because there's no overhead involved in building the http request. It also has some other nice features like rate limiting so you don't overtax the target host.
More importantly, you don't have to consume resources running a webserver and worry about uptime/management of a service like apache.
However, if the current solution is fast enough for you there's no reason to fix what isn't broken.
精彩评论