开发者

Split file according to patterns in two consecutive lines

I have files with the following format:

ATOM   3736  CB  THR A 486      -6.552 153.891  -7.922  1.00115.15           C  
ATOM   3737  OG1 THR A 486      -6.756 154.842  -6.866  1.001开发者_运维百科14.94           O  
ATOM   3738  CG2 THR A 486      -7.867 153.727  -8.636  1.00115.11           C  
ATOM   3739  OXT THR A 486      -4.978 151.257  -9.140  1.00115.13           O  
HETATM10351  C1  NAG A 203      33.671  87.279  39.456  0.50 90.22           C  
HETATM10483  C1  NAG A 702      28.025 104.269 -27.569  0.50 92.75           C    
ATOM   3736  CB  THR B 486      -6.552  86.240   7.922  1.00115.15           C  
ATOM   3737  OG1 THR B 486      -6.756  85.289   6.866  1.00114.94           O  
ATOM   3738  CG2 THR B 486      -7.867  86.404   8.636  1.00115.11           C  
ATOM   3739  OXT THR B 486      -4.978  88.874   9.140  1.00115.13           O  
HETATM10351  C1  NAG B 203      33.671 152.852 -39.456  0.50 90.22           C  
HETATM10639  C2  FUC B 402     -48.168 162.221 -22.404  0.50103.03           C 

I would like to split the file after each line starting with HETATM* but only if the next line starts with ATOM. I would like the new files to be called $basename_$column, where $basename is the base name of the input file and $column is the character at position 22-23 (either A or B, in the example). I am not able to figure out how to check both consecutive lines to determine the splitting point.


Here's an awk version

awk 'NR==1{n=$5}/HETATM/{f=1}f && /^ATOM/{n=$5;f=0}{print > "file"n".txt"}' file

Use FILENAME instead of file to create the same file name.


Here's a simple Python solution with no error checking. Should work in Python 2 or 3; change the first line to match your environment. Don't take this as an example of good coding style.

Edited for unique file names.

#!/usr/bin/env python2.4

import os.path
import sys

fname = sys.argv[1]
bname = os.path.basename(fname)

fin = open(fname)

fout = None
ct = 0

for line in fin:
    if line[:6] == 'HETATM':
        flag = True
    if (not fout) or (flag and line[:4] == 'ATOM'):
        if fout:
            fout.close()
        ct += 1
        fout = open(bname + '_' + line[21:22] + str(ct), 'w')
        flag = False
    fout.write(line)

fout.close()
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜