NSString pointer: theoretical questions after study
I have questions about the NSString pointer. I wanted to get to the bottom of this and actually tried to create a theory to obtain full understanding based on multiple information retrieved from then net. Please believe me when I say that I have not been lazy, and I actually read a lot, but I was still left with uncertainties and questions. Can you please confirm / deny when good / wrong, and I've put additional questions and doubts which are indicated by (?).
Here we go: If I consider this very basic example:
NSString *sPointer = [[NSString alloc]initWithString:@"This is a pointer"];
[sPointer release];
My starting point was: The compiler reserves RAM memory for a pointer type and this memory (which also has an own address) contains the memory address (hexadecimal - binary) of the memory where another variable is stored (where it points to). The actual pointer would take up about 2 bytes:
1) First some general issue - not necessarily linked to objective C. The actual questions about the NSString pointer will come in point 2. A string is a "string of characters", where 1 character takes a fixed amount of memory space, let's say 2 bytes. This automatically means that the memory size occupied by a string variable is defined by the length of the character string. Then I read this on Wikipedia: "In modern byte-addressable computers, each address identifies a single byte of storage; data too large to be stored in a single byte may reside in multiple bytes occupying a sequence of consecutive addresses. " So in this case, the string value is actually contained by multiple addresses and not by a single 1 (this already differs from what I read everywhere) (?). How are these multiple addresses contained in 1 pointer in reality? Will the pointer be divided into multiple addresses as well? Do you know what component in a computer actually identifies and assigns the actual address "codes"?
And now my actual questions;
2) In my example, the code does 2 things:
- It creates a pointer to the address where the string variable is stored.
- It also actually saves the actual "String variable": @"This is a pointer", because otherwise there would be nothing to point to;
OK, my question; I wondered what really happens when you release the pointer [sPointer release]; are you actually releasing the pointer (containing the address), or are you also releasing the actual "string variable" from memory as well? I have learned that when you delete the reference, the memory where the actual variable is stored will just be overwritten at the time the compiler nee开发者_运维技巧ds memory so it does not need to be cleared at that time. Is this wrong? If it is correct, why do they say that it's really important to release the NSString pointer for performance reasons, if you just release the pointer which will basically contain only a few bytes? Or am I wrong, and is the memory where the actual variable is stored in fact also cleared at once with the "release" message?
And finally also: primitive datatypes are not released, but they "do" take memory space at the moment of declaration (but not more than a common pointer). Why shouldn't we release them in fact? What prevents us from doing something like: int i = 5, followed by [i release]?;
I'm sorry - a lot of questions in 1 time! In practice, I never had problems with it, but in theory, I also really want to fully understand it - and I hope that I'm not the only one. Can we discuss the topic? Thank you and sorry for the bother!
Maybe I'm wrong but I just read yesterday that pointers usually take 4 bytes. That answers none of your questions but you seem really interested in this so I figured I would mention it.
I think the source of your confusion is that you are confusing primitives with Objective-C classes. Objective-C classes (or objects to be exact, instances of classes) can accept messages (similar to method invocations in other languages). retain
is one such message. This is why an Objective-C NSString
object can receive the retain
message but not a primitive like an integer. I think that's another one of your confusions. retain
and release
etc. are not Objective-C language constructs, they're actual messages (think methods) you send to objects. That is why they apply to Objective-C objects but not to primitives like integers and floats.
Another similar confusion is that what you read about how strings are stored has more to do with C-style strings, like char *name = "john"
. However, when you create a pointer to an NSString
, that points to an NSString
instance, which itself decides how to handle storing the actual string bytes/characters. Which may or not be the same way that C strings are stored.
data too large to be stored in a single byte may reside in multiple bytes occupying a sequence of consecutive addresses. " So in this case, the string value is actually contained by multiple addresses and not by a single 1 (this already differs from what I read everywhere) (?). How are these multiple addresses contained in 1 pointer in reality?
In C for example, the pointer would point to the address of the first character in the string.
OK, my question; I wondered what really happens when you release the pointer [sPointer release]; are you actually releasing the pointer (containing the address), or are you also releasing the actual "string variable" from memory as well?
You are sending the release
message to the NSString
instance/object. This is important to note to avoid further confusion. You are not acting upon the pointer itself, but upon what the pointer is pointing to, which is the NSString
object. So you are not releasing the pointer itself. After sending the object the release
method, if its reference count has reached 0, then it will handle deallocating itself by deallocating everything it stores, which I imagine includes the actual character string.
If it is correct, why do they say that it's really important to release the NSString pointer for performance reasons, if you just release the pointer which will basically contain only a few bytes?
So yeah, you're actually sending the release
message to the string instance and it handles how to deallocate itself if it has to. If you were to simply erase the pointer so that it no longer points at the string instance, then you simply will no longer know where/how to access the data stored at that location, but it won't make it magically disappear, the program won't automatically know that it can use that memory. What you're hinting at is garbage collection, in which, simply put, unused memory will automatically be freed for subsequent use. Objective-C 2.0 does have garbage collection but as far as I know it's not enabled on iOS devices yet. Instead, the new version of iOS will support a feature known as Automatic Reference Counting in which the compiler itself takes care of reference counting.
Sorry if I didn't answer all of your questions, you asked a ton :P If any of my information is wrong please let me know! I tried to limit my answer to what I felt I did know.
"Or am I wrong, and is the memory where the actual variable is stored in fact also cleared at once with the "release" message?" The memory IS NOT CLEARED, but goes into the free memory pool, so that it does in fact reduce the memory print of the program. If you did not release the pointer you would continue to "hog" memory until you consume all the virtual memory available and crash not only your program but potentiality the system as well.
How are these multiple addresses contained in 1 pointer in reality? Will the pointer be divided into multiple addresses as well? Do you know what component in a computer actually identifies and assigns the actual address "codes"?
From the perspective of the programmer (take note of this), the pointer on itself is usually a 4-byte number that represents the offset from start from the memory (32-bits, in 64-bit you can have addresses up to 8 bytes). The thing is that this pointers points to the start of whatever is being stored, and that's it.
In C for example, the original strings used a NULL (\0) terminated strings to identify when a string ended (Try doing a printf()
on C with a non-zero ended string, and it will print whatever is in memory until it finds a zero). This of course is pretty dangerous, and one has to use functions like strncpy
(notice the "n") indicating that you should manually input the number of chars from the offset until it ends.
A way to circumvent this is storing the used space in the start of the memory address, something like
struct
{
int size;
char *string;
}string;
That stores the size to prevent any issues. Objective-C and many other more abstract language implement in their own way how to handle memory. An NSString*
is quite an abstract class to know what happens behind the scenes, it probably inherits all its memory managing from the NSObject
.
The whole point I'm try to get is that a pointer contains the starting address, and you can jump from byte-to-byte from there (or jumps of a certain size), keeping in mind the total length of whatever you're storing to avoid doing nasty things like overflowing the stack memory (hence, the name of this site).
Now, how the computer gives this addresses is entirely up to the Operating System, and your logical memory address you use in all your programs is quite different from what the underlying implementation uses (physical memory address). Typically, you'll find that memory is stored in segmented units called "frames", and a used frame is called a "page". And the mapping in between physical and logical is done with a "[Page Table]"2.
As you can see, the software handles pretty much everything, but doesn't mean that there isn't hardware to support this, like the TLB, a cpu-level cache that holds recent memory addresses for quick access.
Also, please take my answer with a grain of salt, it's been a while since I studied these subjects.
OK, my question; I wondered what really happens when you release the pointer [sPointer release]; are you actually releasing the pointer (containing the address), or are you also releasing the actual "string variable" from memory as well? I have learned that when you delete the reference, the memory where the actual variable is stored will just be overwritten at the time the compiler needs memory so it does not need to be cleared at that time. Is this wrong? If it is correct, why do they say that it's really important to release the NSString pointer for performance reasons, if you just release the pointer which will basically contain only a few bytes? Or am I wrong, and is the memory where the actual variable is stored in fact also cleared at once with the "release" message?
When you release, you're just decreasing the memory count of the object. What you mean is what happens when it's dealloced (when the count reaches zero).
When you dealloc
something, you're basically saying that the space where it has been reserved it's now free to be replaced by anything else requesting memory (through alloc). The variable may still point to a freed space, and this causes problems (Read about dangling pointers and leaks).
The memory might be cleared, but there's no guarantees.
I hope this clears all the doubts, as all of them spawn from your confusion about memory freeing.
And finally also: primitive datatypes are not released, but they "do" take memory space at the moment of declaration (but not more than a common pointer). Why shouldn't we release them in fact? What prevents us from doing something like: int i = 5, followed by [i release]?;
The thing is C has two main things going (actually, a lot more): The Heap that stores memory that has been requested with alloc (or malloc in C), and these require to be freed. And the Stack, which holds local variables, that die when the function/block ends (the stack pops the function call).
In your example, the variable i
has been locally declared within its scope, and it's contined in the stack. Trying to peform a dealloc/free (also, the variable i won't respond to release, nor dealloc, as it's not an object) won't work, as is not the type of memory which requires to be freed.
I suggest you going back to C before trying to tackle what Objective-C does, because it's hard to have a clear idea how the imperative programming works with all the nice abstractions like release and dealloc.
For the sake of the forum, I will make a brief and simplified summary of your answers as a conclusion. Thanks to all of you for this extended clarification, the mist disappeared! Don't hesitate to react in case you want to add or rectify something:
Rectification: a pointer on the Mac takes 4 bytes of memory space and not 2.
The pointer *sPointer points to an instance of the NSString class and NOT directly to the memory where the chars are saved. The NSString instance consists of a set of iVars in which there is a pointer iVar that points to memory allocated where the char variables that make up the string are stored (defined when using the initWithString: instance method).
[sPointer release]; The release message is not sent to the pointer itself, but to the instance of the NSString Object. You are not acting on the pointer itself, but on what the pointer is pointing to (!).
When sending the alloc message, the retain count of the NSString instance object is increased by 1. When sending a "release" message it doesn't mean that the concerned memory is literally being emptied, but it decreases the retain count by 1. When the retain count reaches zero, the compiler knows that the previously allocated memory is available again for re-use.
The way memories addresses are presented is decided by the operating system. The logical memory address used in programs is different from what the underlying implementation actually uses (physical memory address).
LOCAL variables (not necessarily primitive variables) are stored in the Stack memory (unlike objects instances who are stored in the Heap memory). What this means is that they will be destroyed automatically at the end of a function (they are automatically removed from the stack). More info on the memory constructs stack and heap can be found in several threads that clarify the use and difference in their own way. e.g.. What and where are the stack and heap? / http://ee.hawaii.edu/~tep/EE160/Book/chap14/subsection2.1.1.8.html
Before I answer the points, you start with a false premise. A pointer takes up more than 2 bytes on a non-16 bit system. On the Mac, it takes up 4 bytes for a 32 bit executable, and 8 bytes in a 64 executable.
Let me note that the following is not entirely accurate (for optimization and some other reasons, there are several kinds of internal representations of strings and the initXXX functions decide which is instantiated), but for the sake of use and understandingof strings, the explanation is good enough.
An NSString is a class (and a rather complicated one as well). A string, i.e. an instance of that class, contains some administrative ivars and one that is another pointer which points to a piece of allocated memory big enough to at least hold the bytes/code points that make up the string. Your code (the alloc method, to be precise) reserves enough memory to contain all the ivars of the object (including the pointer to the buffer) and returns a pointer to that memory. That is what you store in your pointer (if initWithString: doesn't change it -- but I won't go into that here, let's assume it doesn't). If necessary, initWithString: allocates the buffer large enough to hold the text of the string and stores its memory in the pointer for it, inside the NSString instance. So it is like this:
sPointer NSString instance buffer +---------------------------+ +-----------------+ +------+ | addr of NSString instance | ----> | ivar | +-> | char | +---------------------------+ | ivar | | | char | | ivar (pointer) | --+ | char | | ivar | | char | | etc... | | char | +-----------------+ | char | | etc. | +------+
In the case of a hard-coded, literal string like
@"Hello"
, the internal pointer only points to that string, which is already stored in the program, in readonly memory. No memory has to be allocated for it, and the memory can't be freed either.
But let's assume you have a string with allocated contents. release (either coded manually, or invoked by an autorelease pool) will decrement the reference count of the string object (the so called retainCount). If that count reaches zero, your instance of the NSString class will be dealloced, and in the dealloc method of the string, the buffer holding the string text will be released. That memory is not cleared in any way, it is only marked as free by a memory manager, which means it can be re-used again for some other purpose.
精彩评论