Linux: uploading unfinished files - with file size check (scp/rsync) [closed]

2023-01-26 23:56 问答作者：

Closed. This question is off-topic. It is not currently accepting answers.

Want to improve this question? Update the question so it's on-topic for Stack Overflow.

Closed 11 years ago.

Improve this question

I typically end up in the following situation: I have, say, a 650 MB MPEG-2 .avi video file from a camera. Then, I use ffmpeg2theora to convert it into Theora .ogv video file, say some 150 MB in size. Finally, I want to upload this .ogv file to an ssh server.

Let's say, the ffmpeg2theora encoding process takes some 15 minutes on my PC. On the other hand, the upload goes on with a speed of about 60 KB/s, which takes some 45 minutes (for the 150MB .ogv). So: if I first encode, and wait for the encoding process to finish - and then upload, it would take approximately

15 min + 45 min = 1 hr

to complete the operation.

So, I thought it would be better if I could somehow start the upload, in parallel with the encoding operation; then, in principle - as the uploading process is slower (in terms of transferred bytes/sec) than the encoding one (in terms of generated bytes/sec) - the uploading process would always "trail behind" the encoding one, and so the whole operation (enc+upl) would complete in just 45 minutes (that is, just the time of the upload process +/- some minutes depending on actual upload speed situation on wire).

My first idea was to pipe the output of ffmpeg2theora to tee (so as to keep a local copy of the .ogv) and then, pipe the output further to ssh - as in:

./ffmpeg2theora-0.27.linux32.bin -v 8 -a 3 -o /dev/stdout MVI.AVI | tee MVI.ogv | ssh user@ssh.server.com "cat > ~/myvids/MVI.ogv"

While this command does, indeed, function - one can easily see in the running log in the terminal from ffmpeg2theora, that in this case, ffmpeg2theora calculates a predicted time of completion to be 1 hour; that is, there seems to be no benefit in terms of smaller completion time for both enc+u开发者_如何学Gopl. (While it is possible that this is due to network congestion, and me getting less of a network speed at the time - it seems to me, that ffmpeg2theora has to wait for an acknowledgment for each little chunk of data it sends through the pipe, and that ACK finally has to come from ssh... Otherwise, ffmpeg2theora would not have been able to provide a completion time estimate. Then again, maybe the estimate is wrong, while the operation would indeed complete in 45 mins - dunno, never had patience to wait and time the process; I just get pissed at 1hr as estimate, and hit Ctrl-C ;) ...)

My second attempt was to run the encoding process in one terminal window, i.e.:

./ffmpeg2theora-0.27.linux32.bin -v 8 -a 3 MVI.AVI      # MVI.ogv is auto name for output

..., and the uploading process, using scp, in another terminal window (thereby 'forcing' 'parallelization'):

scp MVI.ogv user@ssh.server.com:~/myvids/

The problem here is: let's say, at the time when scp starts, ffmpeg2theora has already encoded 5 MB of the output .ogv file. At this time, scp sees this 5 MB as the entire file size, and starts uploading - and it exits when it encounters the 5 MB mark; while in the meantime, ffmpeg2theora may have produced additional 15 MB, making the .ogv file 20 MB in total size at the time scp has exited (finishing the transfer of the first 5 MB).

Then I learned (joen.dk » Tip: scp Resume) that rsync supports 'resume' of partially completed uploads, as in:

rsync --partial --progress myFile remoteMachine:dirToPutIn/

..., so I tried using rsync instead of scp - but it seems to behave exactly the same as scp in terms of file size, that is: it will only transfer up to the file size read at the beginning of the process, and then it will exit.

So, my question to the community is: Is there a way to parallelize the encoding and uploading process, so as to gain the decrease in total processing time?

I'm guessing there could be several ways, as in:

A command line option (that I haven't seen) that forces scp/rsync to continuously check the file size - if the file is open for writing by another process (then I could simply run the upload in another terminal window)
A bash script; say running rsync --partial in a while loop, that runs as long as the .ogv file is open for writing by another process (I don't actually like this solution, since I can hear the harddisk scanning for the resume point, every time I run rsync --partial - which, I guess, cannot be good; if I know that the same file is being written to at the same time)
A different tool (other than scp/rsync) that does support upload of a "currently generated"/"unfinished" file (the assumption being it can handle only growing files; it would exit if it encounters that the local file is suddenly less in size than the bytes already transferred)

... but it could also be, that I'm overlooking something - and 1hr is as good as it gets (in other words, it is maybe logically impossible to achieve 45 min total time - even if trying to parallelize) :)

Well, I look forward to comments that would, hopefully, clarify this for me ;)

Thanks in advance,

Cheers!

May be you can try sshfs (http://fuse.sourceforge.net/sshfs.html).This being a file system should have some optimization though I am not very sure.

继续阅读：filesize rsync scp upload

Linux: uploading unfinished files - with file size check (scp/rsync) [closed]

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？