Buffer size in C
When provided with a buffer size in C, how do I know how much is left and when do I need to stop using the memory?
For example, if the function I am writing is this:
void ascii_morse (lookuptable *table, char* morse, char* ascii, int morse_size) {
}
In this application I will be passed a string (ascii) and I will convert it to morse using some other function to convert each ascii char to morse. The problem I'm facing is how to make sure I am not exceeding the buffer size. I don't even know when to use 开发者_运维技巧the buffer size or how I decrease it everytime I use it.
Of course the output will be to morse (so i will be adding string to morse, but I guess I know how to do that, it is just the buffer size is what is hard to understand to me)
If you need any more information to understand the problem please tell me, I tried my best to explain it.
It sounds like there's some confusion about the "buffer". There is no buffer. morse-size
is telling you how much memory has been allocated to morse
(technically, the chunk of memory that morse
points to). If morse-size is 20 then you have 20 bytes. This is 19 bytes of usable space, because strings are terminated by a null byte. You can think of morse-size
as "maximum length of the string plus one".
You need to check morse-size
to make sure you're not writing more bytes into morse
than it can hold. morse
is nothing more than a number pointing to a single spot in memory. Not a range, but a single spot. What's been allocated to morse
comes after that. If you put more than that into morse
you risk overwriting someone else's memory. C will NOT check this for you, this is the price of maximum performance.
Its like if you went to a theater and the usher tells you, "you can have seat A3 and the next 5" and then leaves. You have to be polite and not take 6 seats, somebody else was given A8.
Tools such as valgrind are invaluable to spot memory mistakes in C and keep your sanity.
Aren't strings in C a hoot? Welcome to the single largest root cause of bugs in the entire computing world.
void ascii-morse (lookuptable *table, char* morse, char* ascii, int morse-size)
You have the size of the output buffer already passed in, by the looks of that prototype above.
ascii
will no doubt be a null terminated string and morse
will be the output buffer: morse_size
(not morse-size
as you have it, since that's not a valid identifier) will be how many characters you are allowed to write.
The pseudocode will be something like:
set apointer to start of ascii, mpointer to start of morse.
while apointer not at end of ascii:
get translation from lookuptable, using the character at apointer.
if length of translation is greater than morse_size:
return an error.
store translation to mpointer.
add 1 to apointer.
add length of translation to mpointer.
subtract length of translation from morse_size.
if morse_size is zero:
return an error.
store string terminator to mpointer.
You'll have to convert that to C and implement the lookup function but that should be a good start.
The pointers are used to extract from, and insert into, the relevant strings. For every character, you basically check whether there is enough room left in the output buffer for adding the morse code segment. And, at the end, you also need to check there's enough room for the string terminator character '\0'
;
The way in which you check if there is enough room is by reducing the morse_size
variable by the length of the string you're adding to morse
each time through the loop. That way, morse_size
will always be the size remaining in the buffer for your use.
You need to pass the buffer size along with the pointer.
int
ascii_to_morse(lookuptable *table,
char* morse, int morse_size,
char* ascii);
The buffer size is not necessarily the same as the current length of the string (which you can find using strlen).
The function as given above will read the ascii string (don't need to know the buffer size, so that is not passed) and writes into a buffer pointed to by morse, of size morse_size. It returns the number of bytes written (not counting the null).
Edit: Here's an implementation of this function which, while it fails to use the right values for morse code, shows how to manage the buffer:
typedef void lookuptable; // we ignore this parameter below anyway
// but using void lets us compile the code
int
ascii_to_morse(lookuptable *table,
char* morse, int morse_size,
char* ascii)
{
if (!ascii || !morse || morse_size < 1) { // check preconditions
return 0; // and handle it as appropriate
// you may wish to do something else if morse is null
// such as calculate the needed size
}
int remaining_size = morse_size;
while (*ascii) { // false when *ascii == '\0'
char* mc_for_letter = ".-"; //BUG: wrong morse code value
++ascii;
int len = strlen(mc_for_letter);
if (remaining_size <= len) { // not enough room
// 'or equal' because we must write a '\0' still
break;
}
strcpy(morse, mc_for_letter);
morse += len; // keep morse always pointing at the next location to write
remaining_size -= len;
}
*morse = '\0';
return morse_size - remaining_size;
}
// test the above function:
int main() {
char buf[10];
printf("%d \"%s\"\n", ascii_to_morse(0, buf, sizeof buf, "aaa"), buf);
printf("%d \"%s\"\n", ascii_to_morse(0, buf, sizeof buf, "a"), buf);
printf("%d \"%s\"\n", ascii_to_morse(0, buf, sizeof buf, "aaaaa"), buf);
return 0;
}
The buffer size cannot be inferred from the pointer alone. It needs to either be passed as an argument, or be somehow know (as from DEFINE values or other constants) or implicitly known... (this latter, implicit approach is "dangerous" for if the size is somehow changed but such changes are not reflected in places where the buffer is used...)
Alternatively, and more typically in the the case of input buffers (buffers which the function will read from), the end of the buffer may be marked by a special character or a sequence of such charcters.
One of the possible (slow) solutions is to allow function to handle NULL buffer pointer and return the required buffer size. Then call it second time with buffer of proper size
Another solution is instead of passing in a pre-allocated destination string to be written to, your function does the allocation and returns a pointer to that. This is a whole lot safer as the caller doesn't have to guess how much memory your function will need.
char *ascii2morse(const char *ascii, lookuptable *table)
You still have to allocate enough memory for the Morse code. Since Morse code isn't fixed length there's two strategies. The first is to simply figure out the maximum possible memory needed for the given length string (longest Morse sequence * number of characters in ascii) and allocate that. This might seem like a waste, but its what the caller will have to do for your original plan anyway.
The alternative is to use realloc
to continually grow the string as you need it. You figure out how many bytes you need to encode the next character, reallocate that much and append it to the string. This might be slower, memory allocators are pretty sophisticated these days, but it will use exactly as much memory as you need.
BOTH avoid the trap where the user has to preallocate an unknown amount of memory and BOTH eliminate the unnecessary "user didn't allocate enough memory" error condition.
If you really wanted to save memory I'd store each dot/dash in the Morse code as 2 bits rather than 8 bits. You have three "words", short and long letter break. That's a minimum of 2 bits of space.
精彩评论