Why is this not a bug in qmail?
I was reading DJB's "Some thoughts on security after ten years of Qmail 1.0" and he listed this function for moving a file descriptor:
int fd_move(to,from) int to; int from; { if (to == from) return 0; if (fd_copy(to,from) == -1) return -1; close(from); return 0; }
It occurred to me that this code does not check the return value of close, so I read the man page for close(2), and it seems it can fail with EINTR
, in which case the appropriate behavior would seem to be to call close again with the same argument.
Since this code was written by someone with far more experience than I in both C and UNIX, and additionally has stood unchanged in qmail for over a decade, I assume there mu开发者_如何学Cst be some nuance that I'm missing that makes this code correct. Can anyone explain that nuance to me?
I've got two answers:
- He was trying to make a point about factoring out common code and often such examples omit error checking for brevity and clarity.
- close(2) may return EINTER, but does it in practice, and if so what would you reasonably do? Retry once? Retry until success? What if you get EIO? That could mean almost anything, so you really have no reasonable recourse except logging it and moving on. If you retry after an EIO, you might get EBADF, then what? Assume that the descriptor is closed and move on?
Every system call can return EINTR, escpecially one that blocks like read(2) waiting on a slow human. This is a more likely scenario and a good "get input from terminal" routine will indeed check for this. That also means that write(2) can fail, even when writing a log file. Do you try to log the error that the logger generated or should you just give up?
When a file descriptor is dup'd, as it is in the fd_copy
or dup2
function, you will end up with more than one file descriptor referring to the same thing (i.e. the same struct file
in the kernel). Closing one of them will simply decrement its reference count. No operation is performed on the underlying object unless it is the last close. As a result, conditions such as EINTR
and EIO
are not possible.
Another possibility is that his function is used only in an application (or a part of one) which has done something to ensure that the call will not be interrupted by a signal. If you aren't going to do anything important with signals, then you don't have to be responsive to them, and it might make sense to mask them all out, rather than wrap every single blocking system call in an EINTR retry. Except of course the ones that will kill you, so SIGKILL and frequently SIGPIPE if you handle it by quitting, along with SIGSEGV and similar fatal errors which will in any case never be delivered to a correct user-space app.
Anyway, if all he's talking about is security, then quite possibly he doesn't have to retry close
. If close failed with EIO, then he would not be able to retry it, it would be a permanent failure. Therefore, it is not necessary for the correctness of his program that close
succeeds. It may well be that it is not necessary for the correctness of his program that close
be retried on EINTR, either.
Usually you want your program to make a best effort to succeed, and that means retrying on EINTR. But this is a separate concern from security. If your program is designed so that some function failing for any reason isn't a security flaw, then in particular the fact that it happens to have failed EINTR, rather than for a permanent reason, isn't a flaw. DJB has been known to be fairly opinionated, so I would not be at all surprised if he has proved some reason why he doesn't need to retry, and therefore doesn't bother, even if doing so would allow his program to succeed in flushing the handle in certain situations where maybe it currently fails (like being explicitly sent a harmless signal with kill
by the user at a crucial moment).
Edit: it occurs to me that retrying on EINTR could potentially itself be a security flaw under certain conditions. It introduces a new behaviour to that section of code: it can loop indefinitely in response to a signal flood, where previously it would make one attempt to close
and then return. I don't know for sure that this would cause qmail any problems (after all, close
itself makes no guarantees how soon it will return). But if giving up after one attempt does make the code easier to analyse then it could plausibly be a smart move. Or not.
You might think that retrying prevents a DoS flaw, where a signal causes a spurious failure. But retrying allows another (more difficult) DoS flaw, where a signal flood causes an indefinite stall. In terms of binary "can this app be DoSed?", which is the kind of absolute security question DJB was interested in when he wrote qmail and djbdns, it makes no difference. If something can happen once, then normally that means it can happen many times.
Only broken unices ever return EINTR
without you explicitly asking for it. The sane semantics for signal()
enable restartable system calls ("BSD style"). When building a program on a system with the sysv semantics (interrupting signals), you should always replace calls to signal()
with calls to bsd_signal()
, which you can define yourself in terms of sigaction()
if it doesn't exist.
It's further worth noting that no systems will return EINTR
on signal receipt unless you have installed signal handlers. If the default action is left in place, or if the signal is set to no action, it's impossible for system calls to be interrupted.
精彩评论