Bash : Cat based on array variable

2023-04-01 16:25 问答作者：

I want to concatenate two or more files depending if there names contain or not elements from an array.

I am reading this kind of file line by line (proteome.pisa):

2PJY_p  chain=(B C) hresname=() hresnumber=()   hatom=()    model=()    altconf=()
2Q7N_p  ch开发者_如何学JAVAain=(A E F G H I J K L)   hresname=(FUC MAN NAG)  hresnumber=()   hatom=()    model=()    altconf=()

For each line, the script extracts the string on the first column and defines it as the variable pdbid. Then it takes the second column and defines it as an array (chain of elements $c). Then it checks if a file called ${pdbid}_${c}_p.pdb exists and, if it does, it merges its content into the file ${pdbid}_p_${chains}.pdb

This is the script:

while read line ; do

echo "$line" > pdb.line
cut -f1 pdb.line > pdb.list
sed -i 's/.*/\"&\"/' pdb.list
sed -i 's/_p//g' pdb.list
awk '{ printf "pdbid="; print }' pdb.list > pdbid.list

cut -f2 pdb.line > chain.list

source pdbid.list
source chain.list

chains=`printf "%s" "${chain[@]}"`

for c in ${chain[@]} ; do
if [ ${#chain[@]} -gt 1 ] && \
   [ -f ${pdbid}_${c}_p.pdb ] ; then  
cat ${pdbid}_${chain[$c]}_p.pdb >> ${pdbid}_p_${chains}.pdb
fi
done

done < proteome.pisa

The expected behaviour was to merge for instance, for the first row, 2PJY_p_B.pdb and 2PJY_p_C.pdb in a file called 2PJY_p_BC.pdb. However, what it actually does is merging the first file twice. I cannot understand why...

This is a great question, for it demonstrates that bash cannot do everything on its own. Instead, it needs helpers such as awk, cut, ... I looked through your solution and it seems after the two source lines, you expect to have variables pdbid, chain, and chains set. However, your script did not set them correctly and I can help with that part. I don't know Perl that much, but think Perl will work nicely in this case. Here is makevars.pl:

while (<STDIN>) {
    my($line) = $_;
    if ($line =~ /^(.*)_p.*chain=\((.*)\).*hresname.*$/) {
        print "pdbid=$1\n";
        print "chain=($2)\n";
        $chains = $2;
        $chains =~ s/ //g;
        print "chains=$chains\n";
    }
}

And here is the shell script:

while read line
do

    echo "$line" | perl makevars.pl >setvars.sh
    source setvars.sh
    # Now, pdbid, chain, and chains are set, do your things

done < proteome.pisa

I hope this helps.

I would suggest preprocessing the input into a simpler form with sed, then looping over that. This is assuming the chain=(...) is always the first such attribute on a line.

#!/bin/sh

# Replace 2ICQ_p chain=(A B C ... Z) attribs= ...   with
# 2ICQ_p A B C ... Z
sed 's/ chain=\(//;s/\).*//' <proteome.pisa |
while read pdbid chain; do
    chains=${chain/ /}
    for c in $chain; do
        test -e ${pdbid}_${c}_p.pdb || continue
        cat ${pdbdid}_${c}_p.pdb
    done >${pdbid}_p_${chains}.pdb
done

This avoids the use of temporary files which riddled your first script; sourcing a generated file also looks rather startling, if not alarming (usually you can use backticks for that sort of thing, but they are not really required here).

There are multiple variants of sed; some (e.g. Linux) want a literal parenthesis to be backslashed, others (e.g. Mac OSX) don't. If this doesn't work, try taking out the backslashes.

read with multiple variable names splits the input on whitespace so that the first variable name receives the first token, etc; the last named variable receives whatever is left, without additional whitespace splitting. continue jumps to the next iteration of the enclosing foror while loop. Other than that, this should be fairly self-explanatory. If you are really pressed to do it all in pure Bourne shell, the sed replacement at the beginning could probably be replaced with something involving string substitutions.

The problems appears to be the definition of the array in this line:

cat ${pdbid}_${chain[$c]}_p.pdb >> ${pdbid}_p_${chains}.pdb

Changing it to :

cat ${pdbid}_${c}_p.pdb >> ${pdbid}_p_${chains}.pdb

appears to solve the problem.

In addition, I have double-quoted all occurrences of "${chain[@]}".

继续阅读：arrays bash cat merge

Bash : Cat based on array variable

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？