reading and extracting data from file using python
I am new to python , and I want to extract the data from this format
<seq id> <alignment start> <alignment end> <envelope start> <envelope end> <hmm acc> <hmm name> <type> <hmm star开发者_运维技巧t> <hmm end> <hmm length> <bit score> <E-value> <significance> <clan>
**FBpp0143497** **5 151** 5 157 PF00339.22 **Arrestin_N** Domain 1 135 149 83.4 **1.1e-23** 1 CL0135
**FBpp0143497** **183 323** 183 324 PF02752.15 Arrestin_C Domain 1 137 138 58.5 **6e-16** 1 CL0135
FBpp0131987 60 280 51 280 PF00089.19 Trypsin Domain 14 219 219 127.7 3.7e-37 1 CL0124
to this format
>FBpp0143497
5 151 Arrestin_N 1.1e-23
>FBpp0143497
183 323 Arrestin_C 6e-16
You could parse the file with the 'csv' module, using space as a delimiter. See the documentation for csv.reader
As this is proteomic data, probably you could find dedicated parsers in the BioPython package
You can use split() to separate the items at spaces and then print out the values you want from the returned list.
精彩评论