Assessing the significance of a BLASTn score?
I am running standalone command line blast to align many query sequences against a large database sequence of nucleotides. I can modify the command line parameters of the blastn program to change various parameters such as the match/mismatch scores.
I am wondering - for the 'bit score' that blastn outputs, does it make sense to compare the bit scores for alignments with identical query and database sequences but different match/mismatch parameters? I am trying to assess how well blast is performing with various parameter values, but I want to make sure that everything is bein开发者_运维技巧g compared on even grounds. Thanks.
It's not clear to me why you think that comparing bit scores will give you an insight as to how well BLAST is performing. The usual method for doing
Unfortunately, much of the work on BLAST and other alignment programs is based on looking at local, ungapped alignments and empirically extending those that theory to gapped alignments. In particular, the bit scores are calculated like this:
S' = ( lambda * S - ln(K) ) / ln(2)
In the formula above, K and lambda are constants for your substitution matrix, S is the score (sum of substitution and gap scores), and S' is the bit score. This means that your bit scores will certainly change as a result of varying the gap open/gap extend parameters, which means that your comparison is invalid. This is an unfortunate result of the fact that there is little theory about gapped alignments, so the optimal gap scores for a given system have to be measured empirically.
Because bit scores aren't comparable, I suggest you do your assessment based on an alternate set of data that doesn't involve the alignment scores. For example, if I'm interested in the optimal gap opening/gap extension parameters for comparing protein sequences, I can look at proteins of known structure and assess each parameter set based on its ability make alignments that make structural sense. This avoids comparing the alignment scores entirely, which is good because comparing bit scores on their own isn't obviously useful.
I'm not sure you can do that. Do you really need to vary the match/mismatch parameters? What is your aim?
It's not necessarily true that bit scores aren't comparable. From the BLAST documentation on NCBI's web site:
"Bit scores are normalized, which means that the bit scores from different alignments can be compared, even if different scoring matrices have been used."
http://www.ncbi.nlm.nih.gov/bookshelf/br.fcgi?book=handbook&part=ch16
精彩评论