BioPython: extracting sequence IDs from a Blast output file

2022-12-10 11:51 问答作者：

I have a BLAST output file in XML format. It is 22 query sequences w开发者_Go百科ith 50 hits reported from each sequence. And I want to extract all the 50x22 hits. This is the code I currently have, but it only extracts the 50 hits from the first query.

from Bio.Blast import NCBIXM
blast_records = NCBIXML.parse(result_handle)
blast_record = blast_records.next()

save_file = open("/Users/jonbra/Desktop/my_fasta_seq.fasta", 'w')

for alignment in blast_record.alignments:
    for hsp in alignment.hsps:
            save_file.write('>%s\n' % (alignment.title,))
save_file.close()

Somebody have any suggestions as to extract all the hits? I guess I have to use something else than alignments. Hope this was clear. Thanks!

Jon

This should get all records. The novelty compared with the original is the

for blast_record in blast_records

which is a python idiom to iterate through items in a "list-like" object, such as the blast_records (checking the CBIXML module documentation showed that parse() indeed returns an iterator)

from Bio.Blast import NCBIXM
blast_records = NCBIXML.parse(result_handle)

save_file = open("/Users/jonbra/Desktop/my_fasta_seq.fasta", 'w')

for blast_record in blast_records:
  for alignment in blast_record.alignments:
      for hsp in alignment.hsps:
            save_file.write('>%s\n' % (alignment.title,))
  #here possibly to output something to file, between each blast_record
save_file.close()

I used this code for extract all the results

from Bio.Blast import NCBIXML
for record in NCBIXML.parse(open("rpoD.xml")) :
    print "QUERY: %s" % record.query
    for align in record.alignments :
        print " MATCH: %s..." % align.title[:60]
        for hsp in align.hsps :
            print " HSP, e=%f, from position %i to %i" \
                % (hsp.expect, hsp.query_start, hsp.query_end)
            if hsp.align_length < 60 :
                 print "  Query: %s" % hsp.query
                 print "  Match: %s" % hsp.match
                 print "  Sbjct: %s" % hsp.sbjct
            else :
                 print "  Query: %s..." % hsp.query[:57]
                 print "  Match: %s..." % hsp.match[:57]
                 print "  Sbjct: %s..." % hsp.sbjct[:57]


print "Done"

or for less details

from Bio.Blast import NCBIXML
for record in NCBIXML.parse(open("NC_003197.xml")) :
    #We want to ignore any queries with no search results:
    if record.alignments :
        print "QUERY: %s..." % record.query[:60]
        for align in record.alignments :
            for hsp in align.hsps :
                print " %s HSP, e=%f, from position %i to %i" \
                % (align.hit_id, hsp.expect, hsp.query_start, hsp.query_end)
print "Done"

I used this site

http://www2.warwick.ac.uk/fac/sci/moac/currentstudents/peter_cock/python/rpsblast/

继续阅读：biopython python xml-parsing xmlblaster

BioPython: extracting sequence IDs from a Blast output file

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？