branching vs multi-threading
There's a fast path in the code that looks like this:
while(1){
select(fd_set...);
if (fd_isUserspace) {
process_user_packet(fd); // RPC involved
} else { // kernel
process_kernel_packet(fd); // RPC invovled
}
} // while(1)
Basically reading an active fd from a set of fds and process it. Currently it is done in a if-else branch and only returns when processing completes. I think I can improve this by using a thread-pool (poolSize>=2) within the if-else so that processing func immediately returns and can begin the while loop again for future fds.
Presumably the process_*_packet is gonna do some RPC work for processing.
I'm aware that dispatching the processing job down to a thread may have some over开发者_运维技巧head (thread_cond_signal/locking etc), but feel like since process_*_packet is probably takes time that is in a larger magnitude (due to RPC) maybe it's worth while.
Would like to get some thoughts (maybe even better idea) and I think this can be a very general question on how the design should be made for better performance.
-Thanks
I wrote a thread pool in Java recently (Required for my parallel computing class, I know there's the built in one) and if you write it properly, it's actually quite fast.
The one huge advantage here if you use multiple threads: You don't have requests blocking anymore. You'll get better response times because you can handle multiple requests simultaneously.
If one takes a rather long time to process, send, or receive, then you don't need that packet to necessarily clog up your tubes.
With some thread pool, you'd just do:
while(1){
select(fd_set...);
if (fd_isUserspace) {
submit_job(process_user_packet, fd);
} else { // kernel
submit_job(process_kernel_packet, fd);
}
} // while(1)
Where we assume submit_job has the signature
void submit_job(void (*func)(void *), void *args);
So that each thread in the thread pool can simply grab the function and arguments that it needs to work on, and call func(args);
I wouldn't worry about the cost of dispatching the job at all. If processing takes any more than 1 millisecond (probably even less on really good implementations) then you'll be golden.
Just an idea, but what if instead you throw out select
and just use one thread per file descriptor? The only major disadvantage is context switching overhead if too many requests show up at once, but that may be preferable to the latency anyway. The advantage is fewer context switches in the non-overloaded case: the kernel directly wakes up the thread waiting for a file descriptor as soon as it's unblocked, rather than first waking up the select thread which has to then wakeup a thread to process the request. And of course the simplicity of something of an advantage in itself...
精彩评论