开发者

Python string interning and substrings

Does python create a completely new string (copying the contents) when you do a开发者_开发百科 substring operation like:

new_string = my_old_string[foo:bar]

Or does it use interning to point to the old data ?

As a clarification, I'm curious if the underlying character buffer is shared as it is in Java. I realize that strings are immutable and will always appear to be a completely new string, and it would have to be an entirely new string object.


Examining the source reveals:

When the slice indexes match the start and end of the original string, then the original string is returned.

Otherwise, you get the result of the function PyString_FromStringAndSize, which takes the existing string object. This function returns an interned string in the case of a 0 or 1-character-width string; otherwise it copies the substring into a new string object.


You may also be interested in islice which does provide a view of the original string

>>> from sys import getrefcount
>>> from itertools import islice
>>> h="foobarbaz"
>>> getrefcount(h)
2
>>> g=islice(h,3,6)
>>> getrefcount(h)
3
>>> "".join(g)
'bar'
>>> 


It's a completely new string (so the old bigger one can be let go when feasible, rather than staying alive just because some tiny string's been sliced from it and it being kept around).

intern is a different thing, though.


Looks like I can answer my own question, opened up the source and guess what I found:

    static PyObject *
    string_slice(register PyStringObject *a, register Py_ssize_t i,
         register Py_ssize_t j)

    ... snip ...

    return PyString_FromStringAndSize(a->ob_sval + i, j-i);

..and no reference to interning. FromStringAndSize() only explicitly interns on strings of size 1 and 0 So it seems clear that you'll always get a totally new object and they won't share any buffers.


In Python, strings are immutable. That means that you will always get a copy on any slice, concatenate, or other operations.

http://effbot.org/pyfaq/why-are-python-strings-immutable.htm is a nice explanation for some of the reasons behind immutable strings.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜