开发者

How to stop sed from buffering?

I have a program that writes to fd3 and I want to process that data with grep and sed. Here is how the code looks so far:


exec 3> >(grep "good:"|sed -u "s/.*:\(.*\)/I got: \1/")
echo "bad:data1">&3
echo "good:data2">&3

Nothing is output until I do a

exec 3>&-

Then, everything that I wanted finally arrives as I expected:

I got: data2

It seems to reply immediately if I use only a grep or only a sed, but m开发者_如何学运维ixing them seems to cause some sort of buffering. How can I get immediate output from fd3?


I think I found it. For some reason, grep doesn't automatically do line buffering. I added a --line-buffered option to grep and now it responds immediately.


You only need to tell grep and sed to not bufferize lines:

grep --line-buffered 

and

sed -u


An alternate means to stop sed from buffering is to run it through the s2p sed-to-Perl translator and insert a directive to have it command-buffered, perhaps like

BEGIN { $| = 1 }

The other reason to do this is that it gives you the more convenient notation from EREs instead of the backslash-annoying legacy BREs. You also get the full complement of Unicode properties, which is often critical.

But you don’t need the translator for such a simple sed command. And you do not need both grep and sed, either. These all work:

perl -nle 'BEGIN{$|=1} if (/good:/) { s/.*:(.*)/I got: $1/; print }'

perl -nle 'BEGIN{$|=1} next unless /good:/; s/.*:(.*)/I got: $1/; print'

perl -nle 'BEGIN{$|=1} next unless /good:/; s/.*:/I got: /; print'

Now you also have access to the minimal quantifier, *?, +?, ??, {N,}?, and {N,M}?. These now allow things like .*? or \S+? or [\p{Pd}.]??, which may well be preferable.


You can merge the grep into the sed like so:

exec 3> >(sed -une '/^good:/s//I got: /p')
echo "bad:data1">&3
echo "good:data2">&3

Unpacking that a bit: You can put a regexp (between slashes as usual) before any sed command, which makes it only be applied to lines that match that regexp. If the first regexp argument to the s command is the empty string (s//whatever/) then it will reuse the last regexp that matched, which in this case is the prefix, so that saves having to repeat yourself. And finally, the -n option tells sed to print only what it is specifically told to print, and the /p suffix on the s command tells it to print the result of the substitution.

The -e option is not strictly necessary but is good style, it just means "the next argument is the sed script, not a filename".

Always put sed scripts in single quotes unless you need to substitute a shell variable in there, and even then I would put everything but the shell variable in single quotes (the shell variable is, of course, double-quoted). You avoid a bunch of backslash-related grief that way.


On a Mac, brew install coreutils and use gstdbuf to control buffering of grep and sed.


Turn off buffering in pipe seems to be the easiest and most generic answer. Using stdbuf (coreutils) :

exec 3> >(stdbuf -oL grep "good:" | sed -u "s/.*:\(.*\)/I got: \1/")
echo "bad:data1">&3
echo "good:data2">&3
I got: data2

Buffering has other dependencies, for example depending on mawk either gawk reading this pipe :

exec 3> >(stdbuf -oL grep "good:" | awk '{ sub(".*:", "I got: "); print }')

In that case, mawk would retain the input, gawk wouldn't.

See also How to fix stdio buffering

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜