开发者

Convert FASTQ ASCII to decimal and hexadecimal in R [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center. Closed 11 years ago.

I have a FASTQ quality score which is presented as a series of ASCII characters. In this case (likely) ASCII character 64 to 126 represent the a score of 0 to 62 (presuming it is Illumina). This gives rise to underlying sequence :

feffefdfbefdfffcfdeTddaYddffbfcI``S_KKX_]]MR[D_TY[VTVXQ]`Q_BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB

How do I extract which is the number of the ASCII characters?

Thank you San

EDIT: This sequence denotes the quality of a biological sequence that is made up of bases (from base pairs in nucleic acids, meaning a character (ATGC)). A base quality is the phred-scaled base error prob开发者_C百科ability which equals -10 log10 Pr{base is wrong}.


Well, as Marek said : you might find a function to convert Illumina quality scores in Bioconductor. You can ask at biostar.stackexchange.com.

Using base functions, you can use charToRaw():

> x <- "feeffdbefc`\\KKX]_BBBB"
> charToRaw(x)
 [1] 66 65 65 66 66 64 62 65 66 63 60 5c 4b 4b 58 5d 5f 42 42 42 42
> as.numeric(charToRaw(x))
 [1] 102 101 101 102 102 100  98 101 102  99  96  92  75  75  88  93  95  66  66  66  66
> as.character(charToRaw(x))
 [1] "66" "65" "65" "66" "66" "64" "62" "65" "66" "63" "60" "5c" "4b" "4b" "58" "5d" "5f" "42" "42" "42" "42"

Mind you, you'll have to escape the backslash, or you'll get into trouble. That depends on how you read in your data and so forth.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜