开发者

I can't understand polling/select in python

I'm doing some threaded asynchronous networking experiment in python, using UDP.

I'd like to understand polling and the select python module, I've never used them in C/C++.

What are those for ? I kind of understand a little select, but does it block while watching a resource ? What is the pur开发者_Python百科pose of polling ?


Okay, one question a time.

What are those for?

Here is a simple socket server skeleton:

s_sock = socket.socket()
s_sock.bind()
s_sock.listen()

while True:
    c_sock, c_addr = s_sock.accept()
    process_client_sock(c_sock, c_addr)

Server will loop and accept connection from a client, then call its process function to communicate with client socket. There is a problem here: process_client_sock might takes a long time, or even contains a loop(which is often the case).

def process_client_sock(c_sock, c_addr):
    while True:
        receive_or_send_data(c_sock)

In which case, the server is unable to accept any more connections.

A simple solution would be using multi-process or multi-thread, just create a new thread to deal with request, while the main loop keeps listening on new connections.

s_sock = socket.socket()
s_sock.bind()
s_sock.listen()

while True:
    c_sock, c_addr = s_sock.accept()
    thread = Thread(target=process_client_sock, args=(c_sock, c_addr))
    thread.start()

This works of course, but not well enough considering performance. Because new process/thread takes extra CPU and memory, not idle for servers might get thousands connections.

So select and poll system calls tries to solve this problem. You give select a set of file descriptors and tell it to notify you if any fd is ready to read/write/ or exception happens.

does it(select) block while watching a resource?

Yes, or no depends on the parameter you passed to it.

As select man page says, it will get struct timeval parameter

int select(int nfds, fd_set *readfds, fd_set *writefds,
       fd_set *exceptfds, struct timeval *timeout);

struct timeval {
long    tv_sec;         /* seconds */
long    tv_usec;        /* microseconds */
};

There are three cases:

  1. timeout.tv_sec == 0 and timeout.tv_usec = 0

    No-blocking, return immediately

  2. timeout == NULL

    block forever until a file descriptor is ready.

  3. timeout is normal

    wait for certain time, if still no file descriptor is available, timeout and return.

What is the purpose of polling ?

Put it into simple words: polling frees CPU for other works when waiting for IO.

This is based on the simple facts that

  1. CPU is way more faster than IO
  2. waiting for IO is a waste of time, because for the most time, CPU will be idle

Hope it helps.


If you do read or recv, you're waiting on only one connection. If you have multiple connections, you will have to create multiple processes or threads, a waste of system resource.

With select or poll or epoll, you can monitor multiple connections with only one thread, and get notified when any of them has data available, and then you call read or recv on the corresponding connection.

It may block infinitely, block for a given time, or not block at all, depending on the arguments.


select() takes in 3 lists of sockets to check for three conditions (read, write, error), then returns (usually shorter, often empty) lists of sockets that actually are ready to be processed for those conditions.

s1 = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s1.bind((Local_IP, Port1))
s1.listen(5)

s2 = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s2.bind((Local_IP, Port2))
s2.listen(5)

sockets_that_might_be_ready_to_read = [s1,s2]
sockets_that_might_be_ready_to_write_to = [s1,s2]
sockets_that_might_have_errors = [s1,s2]


([ready_to_read], [ready_to_write], [has_errors])  = 
       select.select([sockets_that_might_be_ready_to_read],
                     [sockets_that_might_be_ready_to_write_to], 
                     [sockets_that_might_have_errors],            timeout)


for sock in ready_to_read:
    c,a = sock.accept()
    data = sock.recv(128)
    ...
for sock in ready_to_write:
    #process writes
    ...
for sock in has_errors:
    #process errors

So if a socket has no attempted connections after waiting timeout seconds, then the list ready_to_read will be empty - at which point it doesn't matter if the accept() and recv() would block - they won't get called for the empty list....

If a socket is ready to read, then if will have data, so it won't block then, either.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜