Calculate percentage divergence between two genetic sequences in R
I haven't been able to find this in the questions or an R package, hopefully straightforward.
Take two hypothetical genetic sequences:
Sequence A: ATG CGC AAC GTG GAG CAT
Sequence B: ATG GGC TAC GTG GAT CAA
I want to have R code to generate the percentage differe开发者_如何学运维nce in single nucleotides between the two sequences (e.g. 15%).
Any thoughts? Thanks in advance.
If I understand your question correctly, then you just need to do a simple string comparsion. For example,
R> seq1 = c("A", "T", "G", "C", "G", "C",
"A", "A", "C", "G", "T", "G",
"G", "A", "G", "C", "A", "T")
R> seq2 = c("A", "T", "G", "G", "G", "C",
"T", "A", "C", "G", "T", "G",
"G", "A", "G", "C", "A", "A")
R> seq1 != seq2
[1] FALSE FALSE FALSE TRUE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE
[13] FALSE FALSE FALSE FALSE FALSE TRUE
R> sum(seq1 != seq2)/length(seq1)*100
[1] 16.67
To get your data in the above format, have a look at the strsplit
function.
精彩评论