How to stop sed from buffering?
I have a program that writes to fd3 and I want to process that data with grep and sed. Here is how the code looks so far:
exec 3> >(grep "good:"|sed -u "s/.*:\(.*\)/I got: \1/")
echo "bad:data1">&3
echo "good:data2">&3
Nothing is output until I do a
exec 3>&-
Then, everything that I wanted finally arrives as I expected:
I got: data2
It seems to reply immediately if I use only a grep or only a sed, but m开发者_如何学运维ixing them seems to cause some sort of buffering. How can I get immediate output from fd3?
I think I found it. For some reason, grep doesn't automatically do line buffering. I added a --line-buffered
option to grep
and now it responds immediately.
You only need to tell grep and sed to not bufferize lines:
grep --line-buffered
and
sed -u
An alternate means to stop sed
from buffering is to run it through the s2p sed-to-Perl translator and insert a directive to have it command-buffered, perhaps like
BEGIN { $| = 1 }
The other reason to do this is that it gives you the more convenient notation from EREs instead of the backslash-annoying legacy BREs. You also get the full complement of Unicode properties, which is often critical.
But you don’t need the translator for such a simple sed
command. And you do not need both grep
and sed
, either. These all work:
perl -nle 'BEGIN{$|=1} if (/good:/) { s/.*:(.*)/I got: $1/; print }'
perl -nle 'BEGIN{$|=1} next unless /good:/; s/.*:(.*)/I got: $1/; print'
perl -nle 'BEGIN{$|=1} next unless /good:/; s/.*:/I got: /; print'
Now you also have access to the minimal quantifier, *?
, +?
, ??
, {N,}?
, and {N,M}?
. These now allow things like .*?
or \S+?
or [\p{Pd}.]??
, which may well be preferable.
You can merge the grep
into the sed
like so:
exec 3> >(sed -une '/^good:/s//I got: /p')
echo "bad:data1">&3
echo "good:data2">&3
Unpacking that a bit: You can put a regexp (between slashes as usual) before any sed command, which makes it only be applied to lines that match that regexp. If the first regexp argument to the s
command is the empty string (s//whatever/
) then it will reuse the last regexp that matched, which in this case is the prefix, so that saves having to repeat yourself. And finally, the -n
option tells sed to print only what it is specifically told to print, and the /p
suffix on the s
command tells it to print the result of the substitution.
The -e
option is not strictly necessary but is good style, it just means "the next argument is the sed script, not a filename".
Always put sed scripts in single quotes unless you need to substitute a shell variable in there, and even then I would put everything but the shell variable in single quotes (the shell variable is, of course, double-quoted). You avoid a bunch of backslash-related grief that way.
On a Mac, brew install coreutils
and use gstdbuf to control buffering of grep and sed.
Turn off buffering in pipe seems to be the easiest and most generic answer. Using stdbuf (coreutils) :
exec 3> >(stdbuf -oL grep "good:" | sed -u "s/.*:\(.*\)/I got: \1/")
echo "bad:data1">&3
echo "good:data2">&3
I got: data2
Buffering has other dependencies, for example depending on mawk either gawk reading this pipe :
exec 3> >(stdbuf -oL grep "good:" | awk '{ sub(".*:", "I got: "); print }')
In that case, mawk would retain the input, gawk wouldn't.
See also How to fix stdio buffering
精彩评论