delete previous character in a byte array
Given a byte array, in which characters can be 1 byte or 2 bytes long For 1-byte characters, the Most significant bit must be 0. For a 2- byte character, the mos开发者_StackOverflowt significant bit of the most significant byte must be one and the most significant bit of the least significant byte is don’t care (X). You are given the index, I of a character in the byte array. Note that I-1 or I+1 can lead you to either a character or the middle of a character. Give a logic (no need of code) to delete the previous character of the one to which I points.
Well then, let's try. I'm assuming the array starts at index = 0 and that we know its size. If not, change the while loop in the code below to count down and jiggle the logic around a bit.
The real question is not how to remove the previous element. It's figuring out if the given index i is the start of a char (be it 1- or 2-byte) or the middle of a 2-byte one. Once we know that, removing the previous element is trivial, since it's trivial to figure out if the previous element is 1- or 2-byte.
So, I believe this should work to figure out what index i is. Pseudo-code:
if MSB_i == 0
{
if MSB_(i+1) == 1
-> Start of 2-byte char
else
-> Start of 1-byte char
} else
{
if MSB_(i+1) == 0
-> Middle of 2-byte char
else
{
j = i + 1
while (MSB_j == 1) AND (j != size)
j++
j = j - i
if j modulo 2 == 1
-> Start of 2-byte char
else
-> Middle of 2-byte char
}
Obviously, I have not implemented and check this, but it seems like all options are covered. Feel free to correct me or poke me if you want a written explanation as to why this should work.
Looks to me like you all are a bit confused?
First, what is "the middle of a 2 byte character"? Is it perhaps the last 1/3 of the first character and the first 1/3 of the second one? No, its nonsense. A 2 byte character has a first and a second byte, but no "middle".
Second, because it is not specified in what order the most significant and least significant bytes are stored, the exercise is a bit underspecified.
But let us assume, that the LSByte is stored at the lower address then the task is easy. Just check the most significant bit of byte[i-1], if it is set, then the previous character is a double byte character, else it's single byte. (We know that i adresses a character, not just a byte.)
For any element array[i]
, you can determine if i
points to a single byte character, the start of a 2-byte character, or the middle of a 2-byte character using the following test:
Starting with array[i-1]
count the number of contiguous MSB==1.
If array[i]
is preceded by an ODD number of 1's, then array[i]
is the middle of a 2-byte array.
If array[i]
is preceded by an EVEN number of 1's, then if MSB(array[i])
is 0, array[i]
is a single byte character, otherwise array[i]
is the start of a 2-byte character.
Since we're trying to delete the character just before array[i]
, once you determine if array[i]
is the start or the middle of a character, then you have to run the same test for array[i-x]
, where x
is either 1 or 2 depending on if array[i]
is pointing to the start or middle of a character, respectively.
Edit (What happens when arr[0] is 1-byte, and arr[1] is 2-byte?):
Firstly, more detail about the search for contiguous 1's: When counting contiguous 1's, the loop stops if we reach array[0], or MSB(array[j])==0.
odd=0
j = i
while( j && MSB(arr[j-1]) )
j-=1
odd^=1 <<(binary XOR)
When the loop completes, odd will be 1 if there are an odd number of contiguous 1's and odd will be 0 if there are 0 or an even number of contiguous 1's.
If we have an array with a 1-byte character in arr[0] and a 2-byte character in arr[1], then, supposedly, i can only have the values 0, 1, or 2.
- i=0: The loop never runs because i==0. We consider there to be an EVEN number of preceding 1's because odd==0. The MSB of arr[i] is 0, so arr[i] is the start of a 1-byte character.
- i=1: The loop never runs because MSB(arr[i-1]) is 0. We consider there to be an EVEN number of contiguous 1's because odd==0. The MSB of arr[i] is 1, so arr[i] is the start of a 2-byte character.
- i=2: The loop runs one time. We find an ODD number of consecutive 1's. Because there is an odd number of preceding 1's, arr[i] is the middle of a 2-byte character.
精彩评论