How do I find next bit to change in a Gray code in constant time?

2023-02-04 12:44 问答作者：

I have a small 8-bit processor which has a N-to-M decoder on some output lines - eg, for the 5 to 32 bit case, I write 00101 and bit 5 changes state. The only interface to the output is change-state, there is no read-back.

The device counts rapidly (but randomly) occuring events, and should provide this count as a 'single bit changes' code to another device. The output pins are read in parallel by another device, and may be read as rapidly or as sparingly as the other device decides, so the count is necessary.

I do NOT need to use the standard Binary Reflective Gray code - I can use any single-bit changing code.

However, I want to be able to track the next bit to change efficiently.

I do not have a "LowestBitSet" instruction, and finding low开发者_运维问答est bit set across four 8 bit registers is cycle consuming - so I cannot use this "common" approach:

  Keep binary counter A
  Find B as A XOR (A+1)
  Bit to change is LowestBitSet in B

I wish to calculate this in as little memory and registers as possible, and memory is definitely too restricted for any large lookup table. Cycle time is the more important factor.

Any suggestions on algorithms?

"Algorithm L" on page 10 of Knuth, Donald E. "Generating all n-tuples." The Art of Computer Programming, Volume 4A: Enumeration and Backtracking, pre-fascicle 2a, October 15, 2004 seems ideal. Step L4 would be "change_state(j)" for your device.

You don't need to calculate the Gray codes and xor them, you can just use the counter itself, and then use a 256-element lookup table to count the number of trailing zeros. Like this:

unsigned char bit_change(unsigned char counter[4]) {
  static const unsigned char ones[] = {
    0,0,0,1,0,1,1,2,0,1,1,2,1,2,2,3,0,1,1,2,1,2,2,3,1,2,2,3,2,3,3,4,
    0,1,1,2,1,2,2,3,1,2,2,3,2,3,3,4,1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5,
    0,1,1,2,1,2,2,3,1,2,2,3,2,3,3,4,1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5,
    1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5,2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6,
    0,1,1,2,1,2,2,3,1,2,2,3,2,3,3,4,1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5,
    1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5,2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6,
    1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5,2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6,
    2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6,3,4,4,5,4,5,5,6,4,5,5,6,5,6,6,7,    
  };

  unsigned char i;
  for (i = 0; i < 4; i++) {
    unsigned char x = counter[i];
    if (x) {
      x ^= x - 1;
      return 8 * i + ones[x];
    }
  }
}

If you unroll the loop, this is at most 2 adds, 1 xors, and 5 loads (but almost always less). If you don't have 256 bytes for the table, you could use the same strategy on nibbles.

LowestBitSet(A ^ (A+1)) is always 0, unless you work for IBM. I think you mean HighestBitSet(), which is roughly the same as log_2().

The bit-twiddling hack immediately preceeding the one linked by AShelly will be much more feasible on an 8-bit micro.

This should make your original algorithm fairly practical, generating { 0, 1, 0, 2, 0, 1, 0, 3, 0, 1, ... }.

As for the possibility of changing to a different sequence which would also generate a Gray code, for the purpose of making it easier to compute, that's very interesting but I haven't come up with anything.

For Binary Reflective Gray Code, see this answer for an efficient way to calculate the code N.
XOR with the previous code to get a value where only the bit to change is set.
Then you can use this Bit Twiddling Hack (the case where "v is a power of 2") to find the bit index with only 3 operations and a 32-entry table.

The pseudo-code is something like this:

  n = lastCode = 0
  increment:
     n+=1
     newCode = GrayCode(n)
     delta = newCode XOR oldCode
     bitToToggle = BitIndex(delta)
     old code = new code
     GOTO increment;

The algorithm posited by the OP does not generate any Gray code.

The algorithm in this answer: https://stackoverflow.com/a/4657711/7728918 is not constant time, since the conditional test if (x) can vary from 1 to 4 executions depending on the value of counter[i]. This changes the amount of time that is required to calculate bit numbers. There will be 4 different possible execution times that any single calculation may have.

See (cargo cult coders excepted) the reference for the rationale of the following that meets this constant time requirement (no lights, no car, not a single luxury ... not even a "table"):

byte log2toN(byte b){ 
   return 7 - (byte) ( 0x10310200A0018380uLL >> ( (byte)(0x1D*b) >>2 ) ) ; }  
byte BRGC(byte n){ return n ^ n>>1; }
byte bit2flip(byte n){ return log2toN( BRGC(n) ^ BRGC(n+1) ); }

However, there is a much better, more succint and expedient method to meet the OP's criteria.

For cargo cult coding purposes the following conveniently satisfies the OP's conditions minimally (maximally? ;).

The bit number to change is found each time with only two operations: a modulus (which if done modulo 2^ncould be as simple as a bit-wise & operation with n-1 bits ie. the constant 2^n - 1) and an increment.

The actual Johnson Gray code (JGC) sequence is generated incrementally by XORing the previous code with the desired bit, selected by a left shift of 1 to the bit number position. This calculation is NOT needed as per the OP's requirements.

The Johnson Code
-------------------------

The actual Gray coding is irrelevant, so using a Johnson counter's Gray code is exceptionally trivial.

Note that the Johnson Gray code (JGC) density is linear and not logarithmic like base 2 or binary reflected Gray code (BRGC).

With 32 bits in 4 bytes, the sequence can count from 0 to 63 before resetting.

/./ / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / /.

byte  bitCtr=-1;               //   for 4 x 8 bits use 32 instead of 5
int JohnsonCode(){ static byte GCbits = 0; 
                       return  GCbits ^=  1u << ( bitCtr = ++bitCtr %5 ); }

/./ / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / /.

Test output:

  Johnson counter   Bit
     Gray code:   Flipped:
       ....1         0
       ...11         1
       ..111         2
       .1111         3
       11111         4
       1111.         0
       111..         1
       11...         2
       1....         3
       .....         4
       ....1         0
       ...11         1   etc.

partially generated with this code:

void testJohnson(){           
   Serial.println("\n\tJohnson counter\t   Bit\n\t   Gray code:\t Flipped:");

   for( int intifiny=31; --intifiny;  )
      Serial.println( "\t     " + cut( 5, "....." +  

// s.replace(...) returns zip, 
//                    so VVV use lambda function      invoke via VVV      
//  _____________________ V ____________________   ______________ V _____________ 
   [](String  s){ s.replace("0","."); return s; } ( String( JohnsonCode(), BIN ) )

         ) + "\t    " + bitCtr
      );
}

Unfortunately, this answer does not explain "How do you (I, did I, ...) find ...".

For details on the methodology of finding such solutions and using a BRGC similarly ...
see previous ref.: https://stackoverflow.com/a/42846062/7728918

I have been trying to understand Algorithm L, to that end, I think I have found some potentially useful intuition I would like to share.

It starts with noticing the pattern of which bit to flip is recursive and symmetric.

0 1 0 2 0 1 0 3 0 1 0 2 0 1 0

Now it makes sense to think of them as a tree.

                3
        2               2
    1       1       1       1
  0   0   0   0   0   0   0   0

and therefore generated by the following algorithm:

def gen(arg):
    if arg == 0:
        print(arg)
    else:
        gen(arg - 1)
        print(arg)
        gen(arg - 1)

The tree above can be interpreted as the tree of activation frames of this algorithm.

If we were printing a non-zero number, the next number is obvious, it has to be 0. Therefore the problem of predicting the next element reduces to only predicting what will happen after 0.

Here is the interesting observation, that next thing after 0 must be the closest ancestor in the tree such that it is on the right of the current position. This suggest the following algorithm that propagate the right parent and therefore predicts the next element:

def gen(arg, right_parent):
    if arg == 0:
        print("%s %s" % (0, right_parent))
    else:
        gen(arg - 1, arg) # The right parent of my left child is me
        print("%s %s" % (arg, 0))
        gen(arg - 1, right_parent) # The right parent of my right children is my right parent.

Here is an annotated tree with right parents written in brackets:

                              3(4)
              2(3)                            2(4)
      1(2)            1(3)            1(2)            1(4)
  0(1)    0(2)    0(1)    0(3)    0(1)    0(2)    0(1)    0(4)

The problem with this approach is that when we execute it, the code might go through multiple steps of calling or returning so that the time spent between successive prints is not constant. We could argue that the time is amortized constant, after all, each pair of push and pop is associated with printing exactly one number.

Here is another idea. By the time we print the number, we know the stack frame is going away before the next time we print the same number, is it possible for us to front-load the work of returning and calling the same frame?

By the time the first 0 is printed, we knew its right parent is 1, so it will pass its own right_parent when it makes the recursive call again.

We summarize this observation in this rule:

If the right_parent value of a frame is exactly 1 larger than the current frame, then the right_parent value for the next call will be the right_parent value of the right parent frame.

By the time the second 0 is printed, we knew its right parent is 2, so the next call will be done through multiple steps from the second recursive call of the right parent. Any multiple steps call will lead to the fact that it is a left child, and a left child's right parent is always 1 larger than the current frame!

We summarize this observation in this rule:

If the right_parent value of a frame is larger than the current frame by more than 1, then the right_parent value for the next call will be exactly one larger than the current frame's value.

With that two rules, I come up with this algorithm:

def gen():
    right_parent = [1,2,3,4]
    cur = 0

    for i in range(0, 15):
        print(cur)
        j = right_parent[cur]
        if j == cur + 1:
            if j != 4: # Avoid index outside of the list
                right_parent[cur] = right_parent[j]
        else:
            right_parent[cur] = cur + 1
        if cur == 0:
            cur = j
        else:
            cur = 0

This is O(1), but it is not Algorithm L, which does not involve the comparisons. To explore, these comments will probably shed some lights:

def gen():
    right_parent = [1,2,3,4]
    cur = 0

    for i in range(0, 15):
        print(cur)
        next = right_parent[cur]
        if next == cur + 1:
            if next != 4:
                right_parent[cur] = right_parent[cur + 1] # f[j] = f[j + 1]
        else:
            right_parent[cur] = cur + 1                    # f[j + 1] = j + 1
        if cur == 0:
            cur = next                                     # j = f[0]
        else:
            cur = 0                                        # f[0] = 0

gen()

It feels like Algorithm L somehow deals with both a to-be-left and a to-be-right in the same iteration. It might have something to do with the "to-be-active" and "to-be-passive" notion as in Knuth's presentation, but I decide to stop here. I think it is good enough for an intuition into how the algorithm might be developed.

/*
As previously posted this answer, https://stackoverflow.com/questions/4648716#42865348 using a Johnson counter Gray code, is very simple:

Number_of_Bit_To_Flip = ++Number_of_Bit_To_Flip % Total_Number_of_Counter_Bits

which is executed on every event occurrence.

Otherwise, using a Binary Reflected Gray Code and a 4 byte base 2 counter n, ...

Method 1 - using a table */

static const byte twos[ ] = { //  note pattern    V       V       V V V V
  0,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,4,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,5,
    0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,4,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,  6,
    0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,4,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,5,
    0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,4,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,    7,
    0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,4,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,5,
    0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,4,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,  6,
    0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,4,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,5,
    0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,4,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,      8,
};

//  operation count worst case    3 logic   4 index     1 addition
//            for 7/8 of calls    2         3           1

byte bit_change(byte ctr[4]) {
 return  
  (byte[]){ 
    (byte[]){  16 + twos[ctr[2]]               ,
      (byte[]){24 + twos[ctr[3]] ,
               31                } [ !ctr[3] ] } [ !ctr[2] ] ,
    (byte[]){   0 + twos[ctr[0]]               ,
                8 + twos[ctr[1]]               } [ !ctr[0] ] }
                                                         [ctr[0] || ctr[1]];

// --------  NB. orphaned, included for pedagogic purposes  --------

  return  (byte[]){ 0 + twos[ctr[0]] ,   //    this IS totally time constant
                    8 + twos[ctr[1]] ,   // NB. 31 + time "constantator" 
                   16 + twos[ctr[2]] ,   // case ctr[i]==0 for all i
                   24 + twos[ctr[3]] ,
                   31 + twos[ctr[0]] } [ !ctr[0]*( 1+ 
                                         !ctr[1]*( 1+
                                         !ctr[2]*( 1+
                                         !ctr[3]       ) ) ) ];  
 }

/Method 2 - no tables */

byte bin2toN(byte b){  
   return 
      (byte []) {(byte []) {(byte []) {7,6} [b < 128 ] , 
                            (byte []) {5,4} [b <  32 ] } [b < 64 ] ,
                 (byte []) {(byte []) {3,2} [b <   8 ] , 
                            (byte []) {1,0} [b <   2 ] } [b <  4 ] } [b < 16 ] ;
} 

byte flip_bit(byte n[4]){  
return    
  (byte []){ 
    (byte []){   16 + bin2toN(  n[2] & -n[2] )            ,
      (byte []){ 24 + bin2toN(  n[3] & -n[3] ),
                 31                           } [ !n[3] ] } [ !n[2] ] ,
    (byte []){    0 + bin2toN(  n[0] & -n[0] )            ,
                  8 + bin2toN(  n[1] & -n[1] )            } [ !n[0] ] } 
                                                               [ n[0] || n[1] ] ;

 // - - - - - - - - - - - - ORPHANED, fails on zero - - - - - - - - - - - -

  return                             //  included for pedagogic purposes
    (byte []) {                         
      (byte []) { bin2toN(  n[2] & -n[2] ) + 16 ,
                  bin2toN(  n[3] & -n[3] ) + 24 } [ !n[2] ] ,
      (byte []) { bin2toN(  n[0] & -n[0] ) +  0 ,
                  bin2toN(  n[1] & -n[1] ) +  8 } [ !n[0] ] } [ n[0] || n[1] ] ;
}

/*
Bit Bunnies and Cargo Cult Coders have no need to read further.

The efficiency of execution of the above code depends on the fact that n[0], n[1], ... are computed at compile time as fixed addresses which is quite conventional. Also, using a call-by-need optimizing compiler will expedite the array contents so only one indexed value needs to be calculated. This compiler sophistication is likely missing but it is easy to manually assemble the raw machine code to do it (basically a switch, computed goto, etc. ).

Analysis of the above algorithms, using the orphaned code, shows that every function call will execute exactly the same instruction sequence, optimized or not.

In both methods, the non-orphaned returns require handling the case when there is counter roll over on 0, consequently using extra index and logical (!) operations. This extra happens for 1/2 of 1/2 of 1/2 or 1/8 of the total counts, and for one count in this 1/8 there is nothing to do but return 31.

The first method requires 2 logical operations (! ||), 1 addition and 3 index calculations for 7/8 of the total counts. On a single count of zero, 3 logical and 3 index operations are needed and the rest of the other 1/8 need 3 logical, 1 addition and 4 index operations.

The final code on execution of method 2 (compiled optimally), for 7/8 of the calculations, uses 7 logical operations (|| & ! < - the last is two's complement), 1 arithmetic (+) and 5 calculated index operations. The other 1/8, but one instance, need 8 logical, 1 addition and 6 calculated index operations.

Unfortunately, no flash of divine inspiration manifested any cargo code. This is an abridged tale of how this composition's authorship happened.

How this was done involved a crucial preliminary investigation as documented: https://stackoverflow.com/a/42846062. Then code was derived using a succesive refinement process commencing with an assesment of the algorithms in this post.
Specifically, this answer: https://stackoverflow.com/a/4657711 .

This algorithm's time variant execution from the loop overhead will be poignently and prominently accentuated by reducing the return calculation to a single addition and two index operations.

byte bit_change(byte ctr[4]) {
  static byte ones[256]; //  this sparse RA is precomputed once
    for (byte i = 255; --i; ones[i]=0) ;
    ones[ 0] =    ones[ 1] = 0; ones[ 3] = 1; ones[  7] = 2; 
    ones[15] = 3; ones[31] = 4; ones[63] = 5; ones[127] = 6; ones[255] = 7;

//   { ie. this very sparse array is completely adequate for original code
//    0,0, ,1, , , ,2, , , , , , , ,3, , , , , , , , , , , , , , , ,4,
//     , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,5,
//     , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,
//     , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,6,
//     , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,
//     , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,
//     , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,
//     , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,7,  }

//  1/2 of count uses 2 index 2 conditionals 0 increment 1 logic 2 +/-  1 x
//  1/4               3       4              1           1       2      1
//  1/8               4       6              2           1       2      1
//  1/16              5       8              3           1       2      1
//     average  14 = 3.5      5             1.5          1       2      1  

unsigned char i;  for (i = 0; i < 4; i++) {         //  original code
                    unsigned char x = counter[i];   //  "works" but
                         if (x) {                   //  still fails on 
                           x ^= x - 1;              //  count 0 rollover
                           return 8 * i + ones[x];
                   }     }

//  .............................  refinement .............................
byte x;             for (byte i = 0; i < 4; i++)         //
                         if (x = counter[i]) 
                              return  i<<3 + ones[x ^ x - 1];
}

/-------------------------------------------------------------- --------------------------------/

//              for (byte i = 255; --i; twos[i] == ones[i ^ i-1] ) ;
// ones[ ] uses only 9 of 1/4K inefficient,  twos[ ] uses all 1/4K

static const byte twos[ ] = {  
 0,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,4,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,5,
   0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,4,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,  6,
   0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,4,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,5,
   0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,4,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,    7,
   0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,4,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,5,
   0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,4,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,  6,
   0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,4,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,5,
   0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,4,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,      8,
};


//  fix count 0 rollover failure, make time absolutely constant as per OP

byte f0(byte ctr[4]) {
  byte ansr=31;
  for (byte i = 0; i < 4; i++) 
   if (ctr[i]) {
       ansr=(byte[]){0,8,16,24}[i] + twos[ctr[i]];    //  i<<3 faster?
       break;
    }
  for (; i < 4; i++) if (ctr[i]) ;
  return ansr;
}

//..................................................

// loop ops (average):  1.5 ++   2.5 []  5 if
// result calculation:   1  +     2  []       significant loop overhead

byte f1(byte counter[4]) {
  for (byte i = 0; i < 4; i++) 
    if (counter[i]) 
      return  (byte[]){  0 + twos[counter[0]], 
                         8 + twos[counter[1]],                           
                        16 + twos[counter[2]], 
                        24 + twos[counter[3]]  } [i];
  return 31;
}

//..................................................

//    5 +/++    6 []     10 if

byte f2(byte counter[4]){  
  byte i, ansr=31;
  for (i = 0; i < 4; i++) {      //  definite loop overhead
    if (counter[i]) {
       ansr= (byte[]){   0 + twos[counter[0]], 
                         8 + twos[counter[1]],                           
                        16 + twos[counter[2]], 
                        24 + twos[counter[3]]  } [i];
       break;
  } }
  for (; i < 4; i++)   if (counter[i]);  //   time "constantator"
  return ansr;
}

//..................................................

//   4 +    4 !    3 x    1 []    1 computed goto/switch

byte f3(byte counter[4]){          // default: time "constantator"
  switch (!counter[0]*( 1 +        //           counter[0]==0 !!
          !counter[1]*( 1 +
          !counter[2]*( 1 +
          !counter[3]        ) ) ) ){
              case 0:  return   0 +  twos[ counter[0] ] ;
              case 1:  return   8 +  twos[ counter[1] ] ;
              case 2:  return  16 +  twos[ counter[2] ] ;
              case 3:  return  24 +  twos[ counter[3] ] ;
              default: return  31 +  twos[ counter[0] ] ;
}         }

There is a comparable chronology for method 2.

This sequence has been radically attenuated and abbreviated to an intermediate example:

Inadvertently, the code posted in https://stackoverflow.com/a/42865348 was not the exclusively byte sized one as intended. The intended code is in this post. */

byte log2toN(byte b){ return    ( b & 0xF0 ? 4:0 )  +   //    4444....
                                ( b & 0xCC ? 2:0 )  +   //    22..22..
                                ( b & 0xAA ? 1:0 )  ;   //    1.1.1.1.
} 
// . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
byte BRGC(byte n){ return n ^ n>>1; }

byte bitNbyte(byte n){ return log2toN( BRGC(n) ^ BRGC(n+1) ); }

byte bit2flip(byte n[4]){  
   boolean n3=n[3]<255, n2=n[2]<255, n1=n[1]<255, n0=n[0]<255;
           return n0*( 0 + bitNbyte( n[0] ) ) + !n0*( 
                  n1*( 8 + bitNbyte( n[1] ) ) + !n1*( 
                  n2*(16 + bitNbyte( n[2] ) ) + !n2*( 
                  n3*(24 + bitNbyte( n[3] ) ) + !n3*( 0 ) ) ) );
}
// . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
byte bit_flip(byte n[4]){  
//switch(     ( !!n[0] << 3 ) | ( !!n[1] << 2 ) | ( !!n[2] << 1 )  |  !!n[3] )
  switch( 15 ^ ( !n[0] << 3 ) ^  ( !n[1] << 2 ) ^  ( !n[2] << 1 )  ^   !n[3] ) { 
      case 15: case 14: case 13: case 12:
      case 11: case 10: case  9: case  8:  return  0 + log2toN(  n[0] & -n[0]  );
      case  7: case  6: case  5: case  4:  return  8 + log2toN(  n[1] & -n[1]  );
                        case  3: case  2:  return 16 + log2toN(  n[2] & -n[2]  );
                                 case  1:  return 24 + log2toN(  n[3] & -n[3]  );
      default:                             return 31 + log2toN(  n[0] & -n[0]  );
}           }

/*
Rhetorically, the answer to How do I find ... can only be answered explicitly in the personal sense (see this answer: https://stackoverflow.com/a/42846062) as it is not possible to speak for other individuals' cognitive abilities.

The content of https://stackoverflow.com/a/42846062 is crucial for background information and reflects the very personal pedantic mechanism required for the mental gymnastics to solve this problem. Undoubtedly, the milieu and plethora of material is daunting but this is exactly the personal approach taken in garnering sufficient insight, repertoire, anecdotal precedents, etc. to extrapolate and interpolate an answer to answer specifically, what program meets the criteria, exactly. Further, it is this very "chase" that excites and motivates the (perhaps pathological) psyche to invest time and effort to satiate curiosity with an inquisitive quest. */

void setup() {    }

void loop() {     }

/*
Can not edit previous answer, as per commentary, so posting rewrite:

Too impatient?
For immediate gratification and minimal edification, cut to the chase and chase this link where only the final result has been posted:
C code for generating next bit to flip in a gray code

REFs:
C code for generating next bit to flip in a gray code
How do I find next bit to change in a Gray code in constant time?

Deriving nth Gray code from the (n-1)th Gray Code
Gray code increment function
Efficient way to iterate over Gray code change positions
Generating gray codes.

https://en.wikipedia.org/wiki/The_C_Programming_Language  
https://en.wikipedia.org/wiki/The_Elements_of_Programming_Style

WARNING:
For purposes of coding expediency and demonstrable functional execution, some non-trivial programming techniques have been used. However, this is hopefully mitigated for the exposition of the concept rudiments by presenting the essence as trivially and minimally as possible with highlighting by / / / /. Careful reading, study and experiment are encouraged to avoid cargo cult coding, oversights, and perpetrating mistakes.

This answer is manifested in the Arduino IDE ESP8266 core coding environment.

The algorithm as posited by the OP is not exactly correct (as if this;).

The Johnson Code
-------------------------

Since the actual Gray coding is irrelevant, using a Johnson counter's Gray code is an exceptionally easy and poignant way to cognitively and computationally count both the bit to change and the next code.

Note that the Johnson counter Gray code density is linear and not logarithmic.
With 32 bits in 4 bytes, the sequence can count from 0 to 63 before resetting.

It is necessary to verify carefully the functional suitability of the code that follows, modifying it as appropriate.

HINT: Verification is a MUST, especially for the "binary reflected" Gray code (BRGC)!

/./ / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / /./

byte  bitCtr=-1;                //   for 4 x 8 bits use 32 instead of 5
byte JohnsonCode(){ static byte GCbits = 0; 
                        return  GCbits ^=  1u << ( bitCtr = ++bitCtr %5 ); }

/./ / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / /./

void testJohnson(){           
   Serial.println("\n\tJohnson counter\t   Bit\n\t   Gray code:\t Flipped:");

   for( int intifiny=31; --intifiny;  )
      Serial.println( "\t     " + cut( 5, "....." +  

// s.replace(...) returns zip, 
//                    so VVV use lambda function      invoke via VVV      
//  _____________________ V ____________________   ______________ V _____________ 
   [](String  s){ s.replace("0","."); return s; } ( String( JohnsonCode(), BIN ) )

         ) + "\t    " + bitCtr
      );
}

/*
Outputs:

  Johnson counter   Bit
     Gray code:   Flipped:
       ....1         0
       ...11         1
       ..111         2
       .1111         3
       11111         4
       1111.         0
       111..         1
       11...         2
       1....         3
       .....         4
       ....1         0
       ...11         1   etc.

Some background material on the Binary Reflected Gray Code (BRGC)
-----------------------------------------------------------------------------------------------
CONVERSIONS:
---------------------
REF: Code Golf: Gray Code

// These javascript scURIples may run by copy and paste to the URI browser bar.

// convert base 2  to  BRGC:   n^=n>>1  
//     get base 2 from BRGC:   n^=n>>1   n^=n>>2   n^=n>>4  ...

javascript:      n=16; s="<pre>";  
                      function f(i,n){ return i?f(i>>1,n^n>>i):n}
                                  while(n--) s +=  f(4,n^n>>1) .toString(2)+"\n";

javascript:      n=16; s="<pre>"; while(n--) s +=     (n^n>>1) .toString(2)+"\n";

javascript: c=0; n=16; s="<pre>"; while(n--) s +=(c^(c=n^n>>1)).toString(2)+"\n";

COUNTING:
-----------------
The following (as per ref. above) arbitrarily gets both the preceding and following BRGC's for a code.
NB! The order of n1 and n2 is parity determined and non-corresponding otherwise. The ordering might be n1, gray, n2 OR it could be n2, gray, n1, so, eo (parity) discriminates.

unsigned n1 = gray ^ 1;
unsigned n2 = gray ^ ((gray & -gray) << 1); 
gray = eo=!eo ? n1 : n2;                   // eo (or parity) gets Every Other

ie.

bitToFlip =  eo=!eo ? 1 : (gray & -gray) << 1;   gray ^= bitToFlip;

hence

    gray ^=  eo=!eo ? 1 : (gray & -gray) << 1;

/./ / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / /./

byte tooGray( byte (*f)(byte) ){ 
   static byte BRGC=0, base2=0;
   static boolean eo=false;              
   return  

       (*f)(  BRGC ^= (eo=!eo) ? (BRGC & -BRGC) <<1 : 1 )  & 0x3 |
   //  count  ^---------------------------------------^  in raw BRGC   


       (*f)(           base2   ^   base2++ >>1          )  & 0xC ; }
   //  count in base 2 ^---------------------^ and convert to BRGC

/./ / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / /.

REF: The neighbors in Gray code

http://www.graphics.stanford.edu/~seander/bithacks.html   
http://www.inwap.com/pdp10/hbaker/hakmem/hakmem.html  
https://en.wikipedia.org/wiki/Ring_counter#Johnson_counter

Oh yeah, ... count set bits in A ^ A+1 which will have a bit pattern like 000...0111..1 Prove.

How to get bit position for a power of 2 - the n parameters must have a single bit set.

Method 1
*/

byte naive1(byte n){ return  bitNumber(n-1);   }  

byte bitNumber(byte m){  // can use A ^ A+1  ...  BUT >> 1 first OR -1 after
 return ( m & 1  ?1:0 ) + ( m &  2 ?1:0 ) + ( m &  4 ?1:0 ) + ( m &   8 ?1:0 ) +
        ( m & 16 ?1:0 ) + ( m & 32 ?1:0 ) + ( m & 64 ?1:0 ) + ( m & 128 ?1:0 );}
       //    256               512              1024               2048  ...

/*
Method 2
*/

byte naively2(byte n){
 return ( n & 1  ?0:0 ) + ( n &  2 ?1:0 ) + ( n &  4 ?2:0 ) + ( n &   8 ?3:0 ) +
        ( n & 16 ?4:0 ) + ( n & 32 ?5:0 ) + ( n & 64 ?6:0 ) + ( n & 128 ?7:0 );}

/*
Method 3
*/

byte powerOf2(byte n){ return ( n & 0xF0 ? 4:0 )  +   //    4444....
                              ( n & 0xCC ? 2:0 )  +   //    22..22..
                              ( n & 0xAA ? 1:0 )  ; } //    1.1.1.1.

/*
Method 4
*/

// and the penultimate,
//    http://www.graphics.stanford.edu/~seander/bithacks.html#IntegerLogDeBruijn

byte fastNof2up(byte b){ return (byte []){0,1,6,2,7,5,4,3}[(byte)(b*0x1Du)>>5];}
//                                        7,0,5,1,6,4,3,2           0x3Au
//              b==2^N                    0,1,2,4,7,3,6,5           0x17u
//                                        7,0,1,3,6,2,5,4           0x2Eu
//     |------|                 |------|        
//     .....00011101         ........1101....     
//    ......0011101.        .........101.....     
//   .......011101..       ..........01......   
//  ........11101...      ...........1.......       
//     |------|                 |------|

/*
Method 5
Details are at the end.
Can a judicious choice of constants reduce this to only 2 operations?
*/

byte log2toN(byte b){ return 7 - (byte) ( 0x10310200A0018380uLL >> ( (byte)(0x1D*b) >>2 ) ) ; }

/*
Testing Environment
*/

void setup() {Serial.begin(115200);  testJohnson();  test(); fastLog2upN(0);  }

void loop() {   delay(250);  // return ;
  [](byte GrayX2){  Serial.print( GrayX2 ^ 0x0F ? "" : baq(GrayX2)+"\n"); 
   analogWrite( D5, (int []) { 0, 1200,    0, 300 }  [ GrayX2      & 0x3 ] );
   analogWrite( D6, (int []) { 0,    0, 1200, 300 }  [ GrayX2      & 0x3 ] );
   analogWrite( D7, (int []) { 0, 1200,    0, 300 }  [ GrayX2 >> 2 & 0x3 ] );
   analogWrite( D8, (int []) { 0,    0, 1200, 300 }  [ GrayX2 >> 2 & 0x3 ] ); } 
//                                    (  tooGray(           b              )  );
                                      (  tooGray( [](byte n) { return n; } )  );
}

/======================================================================
Caveat:
-----------
The OP's algorithm is not effective.

  Keep binary counter A
  Find B as A XOR (A+1)
  Bit to change is LowestBitSet in B

as seen when coded as:
*/

void test(){
 static byte C=0, bits=0; 
 Serial.println((String)"\n  "+(3^2+1)+"  "+(3^(2+1))+"  "+((3^2)+1) );
 Serial.println(
  "\n                                         manifested by an        actual               "
  "\n                                        obstinate perverse       bit to               " 
  "\n                                              psyche              flip           check" 
  "\n      A            A+1         A ^ A+1       B^=A^A+1         (A^A+1)+1>>1            ");
 for(int intifiny=32, A=0; intifiny--; A=++A%15)  // sort a go infinite ... an epiphany!
  Serial.println( (String) pad(  b(  bits ^=  b( b(A) ^ b(A+1) )  )   ) +"    "+ 
                    baq( (A^A+1)+1>>1 ) +"    "+ baq( C^=(A^A+1)+1>>1 )  +"    "
                                              //       "| "+   pad(A)+" "+ pad(bits)
                    );
  Serial.println(
   "                                      |                                    "
   "\n                                     BITS:                                 " 
   "\n                              Bit Isn't This Silly                         " 
   "\n                               Bit Is Totally Set    (A ^ A+1) & -(A ^ A+1) == 1 ALWAYS " 
 "\n\n non-binary Gray codes?                                                    "
   "\n  {-1,0,1} balanced ternary, factoroid (factoradic), {0,-1} negated binary \n");
}  //  https://en.wikipedia.org/wiki/Steinhaus%E2%80%93Johnson%E2%80%93Trotter_algorithm  

//   some service routines  ...

String cut(byte sz, String str) { return     str.substring(str.length()-sz)   ; }
String pad(byte n             ) { return cut( 4, "         " + String(n,DEC) ); }
String baq(byte n             ) { return cut( 9, "........." + String(n,BIN) ); }
void   q  (                   ) {          /*  PDQ QED PQ's */         } 
void   p  (         String str) { Serial.print( "  " + str + "   " ) ; }
byte   d  (byte n             ) {  p(pad(n));         return n       ; }   
byte   b  (byte n             ) {  p(baq(n));         return n       ; }

/*
Sample output:

                                                                flip  bit                      
                                                               "correctly"    confirm?           
      A            A+1         A ^ A+1       B^=A^A+1         (A^A+1)+1>>1                 
  ........0     ........1     ........1     ........1      1    ........1    ........1   |    0    1
  ........1     .......10     .......11     .......10      2    .......10    .......11   |    1    2
  .......10     .......11     ........1     .......11      3    ........1    .......10   |    2    3
  .......11     ......100     ......111     ......100      4    ......100    ......110   |    3    4
  ......100     ......101     ........1     ......101      5    ........1    ......111   |    4    5
  ......101     ......110     .......11     ......110      6    .......10    ......101   |    5    6
  ......110     ......111     ........1     ......111      7    ........1    ......100   |    6    7
  ......111     .....1000     .....1111     .....1000      8    .....1000    .....1100   |    7    8
  .....1000     .....1001     ........1     .....1001      9    ........1    .....1101   |    8    9
  .....1001     .....1010     .......11     .....1010     10    .......10    .....1111   |    9   10
  .....1010     .....1011     ........1     .....1011     11    ........1    .....1110   |   10   11
     etc.                             |                                    
                                     BITS:                
                              Bit Isn't This Silly             Houston; We have a (an-other) problem    
                               Bit Is Totally Set                    
                        (A ^ A+1) & -(A ^ A+1) == 1 ALWAYS

Curious?
-----------
The following code is a very very crude method that can expedite a hunt for suitable bit packed counting candidates to compute log 2^n (in base 2 ie. n).
*/

byte fastLog2upN(byte b){                            //  b==2^N
    for(long long int i=8, b=1; --i;    )
        Serial.println((int)(0x0706050403020100uLL / (b*=0x100)),HEX)  ; 
    for( int i=9, b=1; --i;b*=2)        //   3A = 1D*2
        Serial.println( 
          (String)      (                           (b>>4 | b>>2 | b>>1) & 7   )   +   "    \t" + 
                        (                                   (0xB8*b) >>8 & 7   )   +   "    \t" + 
                        (                                   (0xB8*b) >>7 & 7   )   +   "    \t" + 
                        (                                   (0x1D*b) >>4 & 7   )   +   "    \t" + 
                        (                                   (0x0D*b) >>3 & 7   )   +   "   |\t" + 
                        (                            ((byte)(0x9E*b) >>2       ) ) +   "    \t" + 
                 (byte) ( 0x07070301C0038007uLL >>   ((byte)(0x9E*b) >>2       ) ) +   "    \t" + 
                        (                            ((byte)(0x1E*b) >>2       ) ) +   "    \t" + 
                 (byte) (  0x7070001C0038307uLL >>   ((byte)(0x1E*b) >>2       ) ) +   "    \t" + 
                        (                            ((byte)(0x5E*b) >>2       ) ) +   "    \t" + 
                 (byte) (  0x703800183838307uLL >>   ((byte)(0x5E*b) >>2       ) ) +   "  \t| " + 
                        (                            ((byte)(0x3A*b))>>5       )   +   "    \t" + 
                        (                            ((byte)(0x3A*b))>>4       )   +   "    \t" + 
                        (                            ((byte)(0x3A*b))>>3       )   +   "    \t" + 
                        (                            ((byte)(0x3A*b))>>2       )   +   "    \t" + 
                        (                            ((byte)(0x3A*b))>>1       )   +   "    \t" + 
                        (                            ((byte)(0x3A*b))>>0       )   +   "  \t| " + 
                 (byte) ( 0x0203040601050007uLL >> 8*((byte)(0x3A*b) >>5       ) ) +   "    \t" + 
          String((byte) ( 0x0706050403020100uLL >>   ((byte)(0x3A*b) >>4       ) ),HEX ) + "\t" + 
          String((byte) (       0x0020010307uLL >>   ((byte)(0x3A*b) >>3       ) ),HEX ) + "\t" + 
          String((byte) ( 0x10300200A0018007uLL >>   ((byte)(0x3A*b) >>2       ) ),HEX ) + "\t|" + 
                        (                            ((byte)(0x1D*b))>>2       )   +   "    \t" + 
                 (byte) ( 0x10710700E0018380uLL >>   ((byte)(0x1D*b) >>2       ) ) +   "    \t" + 
                 (byte) ( 0x10310200A0018380uLL >>   ((byte)(0x1D*b) >>2       ) ) +   "  \t| " + 
       "") ;
   }

javascript: x=6; y=4; n=511; ra=[]; s="<pre>x"; 
   while(n--)for(i=5;i;i=i==3?2:--i){ 
      j=0; 
      for(b=1;b<129;b*=2) ra[j++]=((n*b)&0xFF)>>i;
      ra.sort(function(a, b){return a-b});
      if ( tf=ra[--j] < 64 &&  ra[1]>ra[0]+x ) 
         while( --j && ( tf = (ra[j]>ra[j-1]+x) || ( ra[j]<ra[j-1]+y  && ra[j+1]>ra[j]+x) ) );
      if(tf) s+="\n "+n.toString(16)+" >> "+i+" \t"+ra.join("  "); 
   }; 
  s;

// many >>2 's   to do: sledge hammer creation of acceptable bit pattern solutions
// only want btf's - Bits That Fit (in 8 bytes): (tf=ra[j-1] < 64)
// program proximity control so no BS (Brain Strain!): tf = (ra[j]<ra[j-1]+x) || (ra[j]<ra[j-2]+y) 
// for debug: s+="\n "+n.toString(16)+" >> "+i; 
// for debug: s+="\n" +tf+"\t"+ra[j]+"\t"+ra[j-1]+"\t"+ra[j-2];

继续阅读：algorithm gray-code language-agnostic

How do I find next bit to change in a Gray code in constant time?

更多精彩内容

精彩评论

最新问答

第一次出国飞行流程+注意事项？

再生油（关于再生油的介绍）？

东莞科技进修学院（关于东莞科技进修学院的介绍）？

均为镇政府人员平均年龄不超30？

手机msn在哪里下载（其实很简单）？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？