Extracting multiple parts of a string using bash

2023-01-15 17:44 问答作者：

I have a caret delimited (key=value) input and would like to extract multiple tokens of interest from it.

For example: Given the following input

$ echo -e "1=A00^35=D^150=1^33=1\n1=B000^35=D^150=2^33=2"
1=A00^35=D^22=101^150=1^33=1
1=B000^35=D^22=101^150=2^33=2

I would like the following output

35=D^150=1^
35=D^150=2^

I have tried the following

$ echo -e "1=A00^35=D^150=1^33=1\n1=B000^35=D^150=2^33=2"|egrep -o "35=[^/^]*\^|150=[^/^]*\^"
35=D^
150=1^
35=D^
150=2^

My problem is that egrep returns each match on a separate line. Is it possible to get one line of output for one line of input? Please note that due to the constraints of the larger script, I cannot simply do a blind replace of all the \n characters in the output.

Thank you for any suggestions.This script is for bash 3.2.25. Any egrep alternatives are welcome. Please note that the tokens of interest (35 and 150) may change and I am already generating the egrep pattern in the script. Hence a one liner (if possible) would be great 开发者_如何学运维

You have two options. Option 1 is to change the "white space character" and use set --:

OFS=$IFS
IFS="^ "
set -- 1=A00^35=D^150=1^33=1  # No quotes here!!
IFS="$OFS"

Now you have your values in $1, $2, etc.

Or you can use an array:

tmp=$(echo "1=A00^35=D^150=1^33=1" | sed -e 's:\([0-9]\+\)=: [\1]=:g' -e 's:\^ : :g')
eval value=($tmp)
echo "35=${value[35]}^150=${value[150]}"

To get rid of the newline, you can just echo it again:

$ echo $(echo "1=A00^35=D^150=1^33=1"|egrep -o "35=[^/^]*\^|150=[^/^]*\^")
35=D^ 150=1^

If that's not satisfactory (I think it may give you one line for the whole input file), you can use awk:

pax> echo '
1=A00^35=D^150=1^33=1
1=a00^35=d^157=11^33=11
' | awk -vLIST=35,150 -F^ ' {
        sep = "";
        split (LIST, srch, ",");
        for (i = 1; i <= NF; i++) {
            for (idx in srch) {
                split ($i, arr, "=");
                if (arr[1] == srch[idx]) {
                    printf sep "" arr[1] "=" arr[2];
                    sep = "^";
                }
            }
        }
        if (sep != "") {
            print sep;
        }
    }'
35=D^150=1^
35=d^

pax> echo '
1=A00^35=D^150=1^33=1
1=a00^35=d^157=11^33=11
' | awk -vLIST=1,33 -F^ ' {
        sep = "";
        split (LIST, srch, ",");
        for (i = 1; i <= NF; i++) {
            for (idx in srch) {
                split ($i, arr, "=");
                if (arr[1] == srch[idx]) {
                    printf sep "" arr[1] "=" arr[2];
                    sep = "^";
                }
            }
        }
        if (sep != "") {
            print sep;
        }
    }'
1=A00^33=1^
1=a00^33=11^

This one allows you to use a single awk script and all you need to do is to provide a comma-separated list of keys to print out.

And here's the one-liner version :-)

echo '1=A00^35=D^150=1^33=1
      1=a00^35=d^157=11^33=11
      ' | awk -vLST=1,33 -F^ '{s="";split(LST,k,",");for(i=1;i<=NF;i++){for(j in k){split($i,arr,"=");if(arr[1]==k[j]){printf s""arr[1]"="arr[2];s="^";}}}if(s!=""){print s;}}'

given a file 'in' containing your strings :

$ for i in $(cut -d^ -f2,3 < in);do echo $i^;done
35=D^150=1^
35=D^150=2^

继续阅读：bash

Extracting multiple parts of a string using bash

更多精彩内容

精彩评论

最新问答

碰到游客吸烟，怎么办？怎么委婉的让游客接受？？

优酷怎么投不了到电视优酷投不了到电视有什么方法解决？

不孕不育症应该如何检查？

酷喵客服电话人工电话是多少?？

lol无尽狂潮什么时候出的?？

问答排行榜

Escaping "<" in Perl-generated XML

微信重新建群怎么建？

imessage会显示已读吗？

太快了能不能慢一点好爽~好大~不要拔出来了？

二年级家长回音怎么写大全简短的（二年级家长回音怎么写）？