Split file in blocks with counter
The following awk one-liner allows me to split a file according to the character at position 22:
awk -v pdb="${file}" -F "" '{close(c);c=$22}{print > pdb"_"c".pdb"}' ${file}.1tmp
My files are of the type:
ATOM 8911 N SER W 1 -5.412 94.401 12.569 1.00137.46 N
ATOM 8912 CA SER W 1 -4.093 93.709 12.370 1.00137.35 C
ATOM 8913 C SER W 1 -3.115 93.771 13.604 1.00137.27 C
ATOM 8914 O SER W 1 -2.023 93.177 13.570 1.00137.22 O
ATOM 8915 CB SER W 1 -3.417 94.212 11.063 1.00137.29 C
ATOM 1 N ASP X 7 70.244 176.432 -72.598 1.00121.87 N
ATOM 2 CA ASP X 7 70.164 177.938 -72.649 1.00122.11 C
ATOM 3 C ASP X 7 68.705 178.495 -72.843 1.001开发者_运维百科21.38 C
ATOM 4 O ASP X 7 68.482 179.724 -72.941 1.00121.16 O
ATOM 5 CB ASP X 7 71.128 178.442 -73.745 1.00122.87 C
ATOM 5143 N ASP W 7 -68.623 209.141 -11.831 1.00118.10 N
ATOM 5144 CA ASP W 7 -67.698 209.756 -12.845 1.00118.36 C
ATOM 5145 C ASP W 7 -66.378 210.288 -12.223 1.00118.02 C
ATOM 5146 O ASP W 7 -65.657 211.116 -12.802 1.00118.06 O
ATOM 5147 CB ASP W 7 -68.436 210.840 -13.657 1.00118.67 C
However, the script copies all lines with a W at the 22nd position in the same file even if they are in non-contiguous blocks. I would like to split the file in blocks so that the first contiguous block containing W (or whatever other character) will be named W1 and the second W2 and so on. Can this be easily done with awk or should I go for a loop with a counter or something like that?
awk -v pdb="${file}" 'BEGIN{f=1} NR==1{n=$5;s[$5]=f} $5!=n{s[$5]=f++ ;n=$5} { print > pdb"_"$5"_"s[$5]".txt" }' ${file}
精彩评论