Command substitution within sed expression

2023-04-11 04:20 问答作者：

I'm having little problem with bash/sed. I need to be able to use command substitution within sed expression. I have two big text files:

first is logfile.txt which sometimes* shows error messages by ID (0xdeadbeef is common example) in format ERRORID:0xdeadbeef
second errors.txt has error messages stored in pairs LONG_ERROR_DESCRIPTION, 0xdeadbeef

I was trying to use sed with bash command substitution to do the task:

cat logfile.txt | sed "s/ERRORID:\(0x[0-9a-f]*开发者_运维百科\)/ERROR:$(cat errors.txt |
    grep \1 | grep -o '^[A-Z_]*' )/g"

(^^^ this should be in one line of course)

If it would work then I could get little nicer version of logfile with better error info.

   Lot's of meaningless stuff ERRORID:0xdeadbeef and something else =>
=> Lot's of meaningless stuff ERROR:LONG_ERROR_DESCRIPTION and something else

But it doesn't. The problem is that sed is unable to "inject" regex section (\1) into command substitution. What are my other options? I know that it's possible to build sed expression first or do it other way but I would like to avoid parsing those files several times (they can be huge).

As always big thanks for any help.

*there is no real formatting inside logfile. No sections, columns, tab/coma-separation are used inconsistently

PS. Just to explain. Following expression works, but of course there is no argument passing within it:

echo "my cute cat" | sed "s/cat/$(echo dog)/g"

You can create a sed script from the error message catalog, then apply that sed script to the log file.

Basically, something along these lines:

sed 's/\(.*\), 0x\([0-9A-F]*\)$/s%ERRORID:0x\2%ERROR:\1%g/' errors.txt |
sed -f - logfile.txt

The output from the first sed script should be something like this:

s%ERRORID:0x00000001%ERROR:Out of memory%
s%ERRORID:0x00000002%ERROR:Stack overflow%
s%ERRORID:0x00000031%ERROR:values of beta may cause dom%

That is, a new sed script which specifies a substitution for each error code in the catalog.

There are different dialects of sed so this may require minor tweaking. The sed on Linux I believe should use backslash before grouping parentheses in regular expressions, and gladly tolerate standard input as the argument to the -f option. This is not portable to other Unices, though (but you could substitute Perl for sed if you need portability).

*Edit: If the error messages are fairly static, and/or you want to read the log from standard input, save the generated script in a file;

# Do this once
sed 's/\(.*\), 0x\([0-9A-F]*\)$/s%ERRORID:0x\2%ERROR:\1%g/' errors.txt >errors.sed
# Use it many times
sed -f errors.sed logfile.txt

You could also add #!/usr/bin/sed -f at the top of errors.sed and chmod +x it to make it into a self-contained command script.

I don't know if this would work, since I can't get an answer on whether or not capture groups persist, but there is a lot more to sed than just the s command. I was thinking you could use a capture group in a regex line selector, then use that for the command substitution. Something like this:

/ERRORID:\(0x[0-9a-f]*\)/  s/ERRORID:0x[0-9a-f]*/ERROR:$(grep \1 errors.txt | grep -o '^[A-Z_]*' )/

Anyway, if that doesn't work I would change gears and point out that this is really a good job for Perl. Here's how I would do it, which I think is much cleaner / easier to understand:

#!/usr/bin/perl

while(<>) {
  while( /ERRORID:(0x[0-9a-f]*)/ ) {
    $name = system("grep $1 errors.txt | grep -o '^[A-Z_]*'");
    s/ERRORID:$1/ERROR:$name/g;
  }
  print;
}

Then execute:

./thatScript.pl logfile.txt

Just to let people looking for solution with bare shell and sed. Not perfect but working:

cat logfile.txt | while read line ; do id=$(echo -E "$line" | 
    grep "ERRORID:0x[0-9a-f]*" | grep -o "0x[0-9a-f]*" ) ; 
    if [ ! -z "$id" ] ; then echo -E "$line" | sed "s/$id/$(grep $id errors.txt | 
    grep -o '^[A-Z_]*' )/g" ;else echo -E "$line" ; fi ; done

If you see some fixing options then please share.

With GNU awk for gensub() and the 3rg arg to match():

$ awk '
    NR==FNR {
        map[$NF] = gensub(/,[^,]+$/,"",1)
        next
    }
    match($0,/(.*ERRORID:)(0x[[:xdigit:]]+)(.*)/,a) {
        $0 = a[1] (a[2] in map ? map[a[2]] : a[2]) a[3]
    }
1' errors.txt logfile.txt
Lot's of meaningless stuff ERRORID:LONG_ERROR_DESCRIPTION and something else =>

The above will run much faster than the sed scripts in the currently accepted answer and won't fail given various possible contents of LONG_ERROR_DESCRIPTION such as % or & or \1, nor will it fail when a given ERRORID is a subset of another, e.g. if 0xdead and 0xdeadbeef are 2 separate error codes then the sed scripts can fail depending on the order they appear in errors.txt, e.g. they could convert ERRORS:0xdeadbeef to ERRORS:LONG_ERROR_DESCRIPTIONbeef. by mapping 0xdead first.

继续阅读：bash sed substitution

Command substitution within sed expression

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？