Uniqueness of global Python objects void in sub-interpreters?

2023-01-24 23:44 问答作者：

I have a question about inner-workings of Python sub-interpreter initialization (from Python/C API) and Python id() function. More precisely, about handling of global module objects in a WSGI Python containers (like uWSGI used with nginx and mod_wsgi on Apa开发者_如何学运维che).

The following code works as expected (isolated) in both of the mentioned environments, but I can not explain to my self why the id() function always returns the same value per variable, regardless of the process/sub-interpreter in which it is executed.

from __future__ import print_function
import os, sys

def log(*msg):
    print(">>>", *msg, file=sys.stderr)

class A:
    def __init__(self, x):
        self.x = x
    def __str__(self):
        return self.x
    def set(self, x):
        self.x = x

a = A("one")
log("class instantiated.")

def application(environ, start_response):

    output = "pid = %d\n" % os.getpid()
    output += "id(A) = %d\n" % id(A)
    output += "id(a) = %d\n" % id(a)
    output += "str(a) = %s\n\n" % a

    a.set("two")

    status = "200 OK"
    response_headers = [
        ('Content-type', 'text/plain'), ('Content-Length', str(len(output)))
    ]
    start_response(status, response_headers)

    return [output]

I have tested this code in uWSGI with one master process and 2 workers; and in mod_wsgi using a deamon mode with two processes and one thread per process. The typical output is:

pid = 15278
id(A) = 139748093678128
id(a) = 139748093962360
str(a) = one

on first load, then:

pid = 15282
id(A) = 139748093678128
id(a) = 139748093962360
str(a) = one

on second, and then

pid = 15278 | pid = 15282
id(A) = 139748093678128
id(a) = 139748093962360
str(a) = two

on every other. As you can see, id() (memory location) of both the class and the class instance remains the same in both processes (first/second load above), while at the same time class instances live in a separate context (otherwise the second request would show "two" instead of "one")!

I suspect the answer might be hinted by Python docs:

id(object):

Return the “identity” of an object. This is an integer (or long integer) which is guaranteed to be unique and constant for this object during its lifetime. Two objects with non-overlapping lifetimes may have the same id() value.

But if that indeed is the reason, I'm troubled by the next statement that claims the id() value is object's address!

While I appreciate the fact this could very well be just a Python/C API "clever" feature that solves (or rather fixes) a problem of caching object references (pointers) in 3rd party extension modules, I still find this behavior to be inconsistent with... well, common sense. Could someone please explain this?

I've also noticed mod_wsgi imports the module in each process (i.e. twice), while uWSGI is importing the module only once for both processes. Since the uWSGI master process does the importing, I suppose it seeds the children with copies of that context. Both workers work independently afterwards (deep copy?), while at the same time using the same object addresses, seemingly. (Also, a worker gets reinitialized to the original context upon reload.)

I apologize for such a long post, but I wanted to give enough details. Thank you!

It's not entirely clear what you're asking; I'd give a more concise answer if the question was more specific.

First, the id of an object is, in fact--at least in CPython--its address in memory. That's perfectly normal: two objects in the same process at the same time can't share an address, and an object's address never changes in CPython, so the address works neatly as an id. I don't know how this violates common sense.

Next, note that a backend process may be spawned in two very distinct ways:

A generic WSGI backend handler will fork processes, and then each of the processes will start a backend. This is simple and language-agnostic, but wastes a lot of memory and wastes time loading the backend code repeatedly.
A more advanced backend will load the Python code once, and then fork copies of the server after it's loaded. This causes the code to be loaded only once, which is much faster and reduces memory waste significantly. This is how production-quality WSGI servers work.

However, the end result in both of these cases is identical: separate, forked processes.

So, why are you ending up with the same IDs? That depends on which of the above methods is in use.

With a generic WSGI handler, it's happening simply because each process is doing essentially the same thing. So long as processes are doing the same thing, they'll tend to end up with the same IDs; at some point they'll diverge and this will no longer happen.
With a pre-loading backend, it's happening because this initial code happens only once, before the server forks, so it's guaranteed to have the same ID.

However, either way, once the fork happens they're separate objects, in separate contexts. There's no significance to objects in separate processes having the same ID.

This is simple to explain by way of a demonstration. You see, when uwsgi creates a new process, it forks the interpreter. Now, forks have interesting memory properties:

import os, time

if os.fork() == 0:
    print "child first " + str(hex(id(os)))
    time.sleep(2)
    os.attr = 'test'
    print "child second " + str(hex(id(os)))
else:
    time.sleep(1)
    print "parent first " + str(hex(id(os)))
    time.sleep(2)
    print "parent second " + str(hex(id(os)))
    print os.attr

Output:

child first 0xb782414cL
parent first 0xb782414cL
child second 0xb782414cL
parent second 0xb782414cL
Traceback (most recent call last):
  File "test.py", line 13, in <module>
    print os.attr
AttributeError: 'module' object has no attribute 'attr'

Although the objects seem to reside at the same memory addr, they are different objects, but this is not python, but the os.

edit: I suspect the reason that mod_wsgi imports twice is that it creates further processes via calling python rather than forking. uwsgi's approach is better because it can use less memory. fork's page sharing is COW (copy on write).

继续阅读：mod-wsgi python

Uniqueness of global Python objects void in sub-interpreters?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？