Why does this EXC_BAD_ACCESS happen with long long and not with int?
I've run into a EXC_BAD_ACCESS
with a piece of code that deals with data serialization. The code only fails on device (iPhone) and not on simulator. It also fails only on certain data types.
Here is a test code that reproduces the problem:
template <typename T>
void test_alignment() {
// allocate memory and record the original address
unsigned char *origin;
unsigned char *tmp = (unsigned char*)malloc(sizeof(unsigned short) + sizeof(T));
origin = tmp;
// push data with size of 2 bytes
*((unsigned short*)tmp) = 1;
tmp += sizeof(unsigned short);
// attempt to push data of type T
*((T*)tmp) = (T)1;
// free the memory
free(origin);
}
static void test_alignments() {
test_alignment<bool>();
test_alignment<wchar_t>();
test_alignment<short>();
test_alignment<int>();
test_alignment<long>();
test_alignment<long long>(); // fails开发者_开发知识库 on iPhone device
test_alignment<float>();
test_alignment<double>(); // fails on iPhone device
test_alignment<long double>(); // fails on iPhone device
test_alignment<void*>();
}
Guessing that it must be a memory alignment issue, I decided I want to understand the problem thoroughly. From my (limited) understanding of memory alignment, when tmp
gets advanced by 2 bytes, it becomes misaligned for data types whose alignment is greater than 2 bytes:
tmp += sizeof(unsigned short);
But the test code is executed alright for int
and others! It only fails for long long
, double
and long double
.
Examining the size and alignment of each data type revealed that the failing data types are the ones that have different sizeof
and __alignof
values:
iPhone 4:
bool sizeof = 1 alignof = 1
wchar_t sizeof = 4 alignof = 4
short int sizeof = 2 alignof = 2
int sizeof = 4 alignof = 4
long int sizeof = 4 alignof = 4
long long int sizeof = 8 alignof = 4 // 8 <> 4
float sizeof = 4 alignof = 4
double sizeof = 8 alignof = 4 // 8 <> 4
long double sizeof = 8 alignof = 4 // 8 <> 4
void* sizeof = 4 alignof = 4
iPhone Simulator on Mac OS X 10.6:
bool sizeof = 1 alignof = 1
wchar_t sizeof = 4 alignof = 4
short int sizeof = 2 alignof = 2
int sizeof = 4 alignof = 4
long int sizeof = 4 alignof = 4
long long int sizeof = 8 alignof = 8
float sizeof = 4 alignof = 4
double sizeof = 8 alignof = 8
long double sizeof = 16 alignof = 16
void* sizeof = 4 alignof = 4
(These are the result of running the print function from "C++ data alignment and portability")
Can someone enlighten me what's causing the error? Is the difference really the cause of EXC_BAD_ACCESS
? If so, by what mechanics?
That's actually very annoying, but not so unexpected for those of us bought up in a pre-x86 world :-)
The only reason that comes to mind (and this is pure speculation) is that the compiler is "fixing" your code to ensure that the data types are aligned correctly but the sizeof/alignof
mismatches are causing problems. I seem to recall that ARM6 architecture relaxed some of the rules for some data types but never got a good look at it because the decision was made to go with a different CPU.
(Update: this is actually controlled by a register setting (hence probably the software) so I guess even modern CPUs can still complain bitterly about the misalignments).
The first thing I would do would be to have a look at the generated assembly to see if the compiler is padding your short to align the next (actual) data type (that would be impressive) or (more likely) pre-padding the actual data type before writing it.
Secondly, find out what the actual alignment requirement are for the Cortex A8 which I think is the core used in the IPhone4.
Two possible solutions:
1/ You may have to cast each type into a char
array and transfer the characters one at a time - this should hopefully avoid the alignment issues but may have a performance impact. Use of memcpy
would probably be best since it will no doubt be coded to take advantage of the underlying CPU already (such as transferring four-byte chunks where possible with one-byte chunks at the start and end).
2/ For those data types that don't want to be put immediately after a short
, add enough padding after that short
to ensure they align correctly. For example, something like:
tmp += sizeof(unsigned short);
tmp = (tmp + sizeof(T)) % alignof(T);
which should advance tmp
to the next properly aligned location before attempting to store the value.
You'll need to do the same reading it back later (I'm assuming the short is indicative of the data being stored so you can tell what data type it is).
Putting the final solution from OP in the answer for completeness (so people don't have to check the comments):
First, the assembly (on Xcode, Run menu > Debugger Display > Source and Disassembly
) shows that the STMIA
instruction is used when handling 8 bytes of data (i.e., long long
), instead of the STR
instruction.
Next, section "A3.2.1 Unaligned data access" of the "ARM Architecture Reference Manual ARMv7-A" (the architecture corresponding to Cortex A8) states that STMIA
does not support unaligned data access while STR
does (depending on certain registry settings).
So, the problem was the size of long long
and misalignment.
As for the solution, one-char-at-a-time is working, as a starter.
This is likely a memory alignment issue with ARM chips. ARM chips cannot handle unaligned data, and have unexpected behaviour if accessing data that isn't aligned to certain boundaries. I don't have the data off the top of my head on what the alignment rules for the iPhone's ARM chip is, but the best way to solve this is to not poke data using pointer tricks.
Every ARM processor includes an instruction to load or store a single word at a particular address, as well as instructions that load or store multiple words at a time. Some processors can automatically convert a single unaligned load/store into a series of two or three operations, but such ability does not extend to the instructions that load/store multiple words at a time. I would expect that most operations on an int
will only use the single-word load/store instruction [in some rare cases, a compiler might e.g. realize that two int
variables which are stored consecutively could be loaded into registers using a single instruction, but I wouldn't particularly expect such optimizations]. Operations on a long long
, however, would routinely load a pair of registers from consecutive memory locations, and would thus benefit from using a single instruction. I've not profiled the latest ARM chips, but on something like an ARM7-TDMI, two consecutive LDR
instructions will take three cycles each; an LDM
which loads two registers would take four cycles. Even if the LDM
needed to be preceded by an ADD
to compute the address (LDR
has more addressing modes than LDM
), two instructions taking five cycles would still be better than two instructions taking six.
精彩评论