Is the order that tee prints to stdout guaranteed?
You can split a pipe using the tee
command under linux as follows
printf "line1\nline2\nline3\n" | tee >(wc -l ) | (awk '{print "this is awk: "$0}')
which yields the output
this is awk: line1
this is awk: line2
this is awk: line3
this is awk: 3
My question, is that order of printing guaranteed? Will the tee
split pipe that counts the number of lines always print at the end? Is there a way to always print it at the start? Or is the order of printing tee
nev开发者_JAVA百科er guaranteed?
It is not defined by tee, but as Daenyth says, wc won't be finished until tee has finished passing it data - so usually tee will have passed it on to awk by then too. In this instance it might be better to have awk do the counting.
echo -ne {one,two,three,four}\\n | \
awk '{print "awk processing line " NR ": "$0} END {print "Awk saw " NR " lines"}'
The downside being that it won't know the number untils it finishes (knowing it requires buffering the data). In your example, both tee and wc have stdout connected to the same pipe (stdin for awk), but the order is undefined. cat (and most other piping tools) can be used to assemble files in a known order.
There are more advanced piping techniques that could be used, such as bash coprocesses (coproc) or named pipes (mkfifo or mknod p). The latter gets you names in the filesystem, which can be passed to other processes, but you'll have to clean them up and avoid collissions. tempfile or $$ may be useful for that. Pipes are not for buffering data, as they often have limited size and will simply block writes.
An example of where pipes are the wrong solution:
mkfifo wcin wcout
wc -l < wcin > wcout &
yes | dd count=1 bs=8M | tee wcin | cat -n wcout - | head
The problem here is that tee will get stuck trying to write things to cat, which wants to finish with wcout first. There's simply too much data for the pipe from tee to cat.
Edit regarding dmckee's answer: Yes, the order may be repeatable, but it is not guaranteed. It is a matter of scale, scheduling and buffer sizes. On this GNU/Linux box, the example starts breaking up after a few thousand lines:
seq -f line%g 20000 | tee >(awk '{print "*" $0 "*"}' ) | \
(awk '{print "this is awk: "$0}') | less
this is awk: line2397
this is awk: line2398
this is awk: line2*line1*
this is awk: *line2*
this is awk: *line3*
I suspect that in this case, wc
is waiting for EOF, and so it will not return (or print output) until the first command is done sending input, whereas awk acts line by line and so will always print first. I don't know if it's defined when sending to other processes.
Why not just have awk count the lines before printing the lines themselves?
I don't think that you can count on it. The My trial run suggests that it might be (at least in bash). As Daenyth explains, this particular case is special, but try it with wc
here runs in a separate process, so there is no synchronization.grep -o line
instead of wc
and see what you get.
That said, on my MacBoox I get:
$ printf "line1\nline2\nline3\nline4\nline5\n" | tee >(grep -o line ) | (awk '{print "this is awk: "$0}')
this is awk: line1
this is awk: line2
this is awk: line3
this is awk: line4
this is awk: line5
this is awk: line
this is awk: line
this is awk: line
this is awk: line
this is awk: line
very consistently. I'd have to read the bash man page very closely to be sure.
Similarly:
$ printf "line1\nline2\nline3\nline4\nline5\n" | tee >(awk '{print "*" $0 "*"}' ) | (awk '{print "this is awk: "$0}')
this is awk: line1
this is awk: line2
this is awk: line3
this is awk: line4
this is awk: line5
this is awk: *line1*
this is awk: *line2*
this is awk: *line3*
this is awk: *line4*
this is awk: *line5*
everytime...and
$ printf "line1\nline2\nline3\nline4\nline5\n" | tee >(awk '{print "*" $0 "*"}' ) | (grep line)
line1
line2
line3
line4
line5
*line1*
*line2*
*line3*
*line4*
*line5*
精彩评论