Regular Expression for String representing DNA code
Hello I am try开发者_开发知识库ing to use regular expressions in a java program. I would like the regex to identify a String of unknown length and whose charachters are only 'C', 'A', 'G' or 'T'. Thanks for your help.
Easy, just use a character class:
[CAGT]+
Or if the entire string has to comprise of the chars CAGT for it to match:
^[CAGT]+$
Adding to the above :
^[CAGTcagt]+$
To ensure detection of lowercase and upper case charcters.
I disagree with the most voted answer. With [ACGT]+
, a large string will lead to a lot of memory usage. So I would use a negated regex instead, and check if the string doesn't contain non [ACGT] characters instead:
str !~ [^ACGTacgt]
精彩评论