Split file according to patterns in two consecutive lines
I have files with the following format:
ATOM 3736 CB THR A 486 -6.552 153.891 -7.922 1.00115.15 C
ATOM 3737 OG1 THR A 486 -6.756 154.842 -6.866 1.001开发者_运维百科14.94 O
ATOM 3738 CG2 THR A 486 -7.867 153.727 -8.636 1.00115.11 C
ATOM 3739 OXT THR A 486 -4.978 151.257 -9.140 1.00115.13 O
HETATM10351 C1 NAG A 203 33.671 87.279 39.456 0.50 90.22 C
HETATM10483 C1 NAG A 702 28.025 104.269 -27.569 0.50 92.75 C
ATOM 3736 CB THR B 486 -6.552 86.240 7.922 1.00115.15 C
ATOM 3737 OG1 THR B 486 -6.756 85.289 6.866 1.00114.94 O
ATOM 3738 CG2 THR B 486 -7.867 86.404 8.636 1.00115.11 C
ATOM 3739 OXT THR B 486 -4.978 88.874 9.140 1.00115.13 O
HETATM10351 C1 NAG B 203 33.671 152.852 -39.456 0.50 90.22 C
HETATM10639 C2 FUC B 402 -48.168 162.221 -22.404 0.50103.03 C
I would like to split the file after each line starting with HETATM* but only if the next line starts with ATOM. I would like the new files to be called $basename_$column, where $basename is the base name of the input file and $column is the character at position 22-23 (either A or B, in the example). I am not able to figure out how to check both consecutive lines to determine the splitting point.
Here's an awk
version
awk 'NR==1{n=$5}/HETATM/{f=1}f && /^ATOM/{n=$5;f=0}{print > "file"n".txt"}' file
Use FILENAME
instead of file
to create the same file name.
Here's a simple Python solution with no error checking. Should work in Python 2 or 3; change the first line to match your environment. Don't take this as an example of good coding style.
Edited for unique file names.
#!/usr/bin/env python2.4
import os.path
import sys
fname = sys.argv[1]
bname = os.path.basename(fname)
fin = open(fname)
fout = None
ct = 0
for line in fin:
if line[:6] == 'HETATM':
flag = True
if (not fout) or (flag and line[:4] == 'ATOM'):
if fout:
fout.close()
ct += 1
fout = open(bname + '_' + line[21:22] + str(ct), 'w')
flag = False
fout.write(line)
fout.close()
精彩评论