开发者

Swap letters in a string

I need to swap letters in a string with the following rules:

  • A is replaced by T
  • T is replaced by A
  • C is replaced by G
  • G is replace开发者_Go百科d by C

For example: ACGTA should become TGCAT

What would be the best way to resolve this?


Searching for java "A to T, T to A" found this suggestion:

String sequence = "AATTTCTCGGTTTCAAT";
sequence = sequence.replace("A", "t")
                   .replace("T", "a")
                   .replace("C", "g")
                   .replace("G", "c")
                   .toUpperCase();
System.out.println(sequence);

This is a simple and concise solution that works for your specific situation and will have acceptable performance if your DNA strings are relatively short. For a more general solution for handling large amounts of data you should iterate over the characters one by one and build a new string. Or as polygenelubricants pointed out - consider a storage format that only uses 2 bits per base instead of 16.


I would go for a more general solution like this:

public String tr(String original, String trFrom, String trTo) {
  StringBuilder sb = new StringBuilder();

  for (int i = 0; i < original.length(); ++i) {
    int charIndex = trFrom.indexOf(original.charAt(i));
    if (charIndex >= 0) {
      sb.append(trTo.charAt(charIndex));
    } else {
      sb.append(original.charAt(i));
    }
  }

  return sb.toString(); 
}

Calling the function like this would give the result you need:

tr("ACGTA", "ATCG", "TAGC")

So the function is pretty much the same as unix tr utility:

echo ACGTA | tr ATCG TAGC


Like I explained yesterday, Strings are immutable, you can't change a String, you have to create a new one and replace the old one.

You can solve your problem like this:

String s = "ACGTA";
StringBuilder sb= new StringBuilder();
for (char c:s.toCharArray()) {
  switch(c) {
    case 'A': sb.append('T');break;
    case 'T': sb.append('A');break;
    case 'C': sb.append('G');break;
    case 'G': sb.append('C');break;
    default: //handle error here -> invalid char in String
  }
}
s = sb.toString();

The advantage of this solution is that you just don't create too many String objects (every 'replace' operation creates a new String and this can kill performance if you have to revert a lot of dna sequences)


Here is more performant version based on the very helpful comments from polygenelubricants and rsp:

String s = "ACGTA";
char[] reverse = new char[s.length()];
for (int i = 0; i < reverse.length; i++) {
  switch(s.charAt(i)) {
    case 'A': reverse[i] = 'T';break;
    case 'T': reverse[i] = 'A';break;
    case 'C': reverse[i] = 'G';break;
    case 'G': reverse[i] = 'C';break;
    default: //handle error here -> invalid char in String
  }
}
s = new String(reverse);


DNA has a small alphabet. You can use a lookup table, replacing some statements with a simple array indexing.

This approach:

  • Traverses the sequence only once.
  • Eliminates the conditional statements.
  • Can be stable in terms of letter case, which is sometimes used to communicate information in DNA sequences.
  • Can handle IUPAC ambiguity codes.
  • Can handle gaps.
  • Can easily provide a reverse complement.

First, you need a lookup table.

private static final String COMPLEMENT_TABLE 
  // 0123456789ABCDEF0123456789ABCDEF
  = "                                " // 0-31
  + "             -                  " // 32-63
  + " TVGH  CD  M KN   YSAABWXR      " // 64-95
  + " tvgh  cd  m kn   ysaabwxr      "; // 96-127
  //  ABCDEFGHIJKLMNOPQRSTUVWXYZ

private static final byte[] COMPLEMENT_TABLE_BYTES 
  = COMPLEMENT_TABLE.getBytes( StandardCharsets.US_ASCII );

Then, you can find the complement's bases by a simple table lookup.

public static byte[] complement( byte[] sequence ) {
    int length = sequence.length;
    byte[] result = new byte[ length ];

    for ( int i = 0; i < length; ++i ) {
        result[i] = COMPLEMENT_TABLE_BYTES[ sequence[i] ];
    }

    return result;
}

If desired for convenience with small sequences, you can provide a method that accepts and returns a String.

public static String complement( String sequence ) {
    byte[] complementBytes = complement( 
      sequence.getBytes( StandardCharsets.US_ASCII ));
    return new String( complementBytes, StandardCharsets.US_ASCII );
}

The reverse complement can be computed in the same loop.

public static byte[] reverseComplement( byte[] sequence ) {
    int length = sequence.length;
    byte[] result = new byte[ length ];

    for ( int i = 0; i < length; ++i ) {
        result[ (length - i) - 1] = COMPLEMENT_TABLE_BYTES[ sequence[i] ];
    }

    return result;
}

public static String reverseComplement( String sequence ) {
    byte[] complementBytes = reverseComplement( 
      sequence.getBytes( StandardCharsets.US_ASCII ));
    return new String( complementBytes, StandardCharsets.US_ASCII );
}

Using your example sequence:

public static void main(String[] args) {
    String sequence = "ACGTA";

    String complementSequence = complement( sequence );
    System.out.println( String.format( 
       "complement(%s) = %s", sequence, complementSequence ));

    String reverseComplementSequence = reverseComplement( sequence );
    System.out.println( String.format( 
      "reverseComplement(%s) = %s", sequence, reverseComplementSequence ));
}

We get this output:

complement(ACGTA) = TGCAT
reverseComplement(ACGTA) = TACGT
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜