What do I get this Error: EADDRINUSE, Address already in use when stress-testing Node.js with cradle and CouchDB?
I am trying to measure the throughput of a simple Node.js program with a CouchDB backend using cradle as the DB driver. When I put load against the program I get the following error within 30 seconds:
EADDRINUSE, Address already in use
Here is my program:
var http = require ('http'),
url = require('url'),
cradle = require('cradle'),
c = new(cradle.Connection)('127.0.0.1',5984,{cache: false, raw: false}),
db = c.database('testdb'),
port=8081;
http.createServer(function(req,res) {
var id = url.parse(req.url).pathname.substring(1);
db.get(id,function(err, doc) {
res.writeHead(200,{'Content-Type': 'application/json'});
res.write(JSON.stringify(doc));
res.end();
});
}).listen(port);
console.log("Server listening on port "+port);
I am using a JMeter script with 50 concurrent users. The average response time is 120ms, average size of the document returned 3KB.
As you can see I set the caching of Cradle to false. To investigate I looked at the number of waiting sockets: It increases up to about 4000, at which point it crashes (netstat | grep WAIT | wc -l)
To test other options I set the caching to true. In this case the program doesn't crash, but the number of waiting sockets increases to almost 10000 over time.
I also wrote the same program (sans the asynchronous part) as a Java Servlet, and it runs fine without the number of waiting sockets increasing much beyond 20.
My question is: Why do I get the ' EADDRINUSE, Address already in use' error? Why is the number of waiting sockets so high?
P.S.: This is a snippet from the output of netstat|grep WAIT:
tcp4 0 0 localhost.5984 localhost.58926 TIME_WAIT
tcp4 0 0 localhost.5984 localhost.58925 TIME_WAIT
tcp4 0 0 localhost.58924 localhost.5984 开发者_Go百科 TIME_WAIT
tcp4 0 0 localhost.58922 localhost.5984 TIME_WAIT
tcp4 0 0 localhost.5984 localhost.58923 TIME_WAIT
Are you sure you don't have a zombie process on 8001?
ps aux | grep node
might help
Also wrote an article to help people get started with node and couchdb, if you are interested you can check out http://writings.nunojob.com/2011/09/getting-started-with-nodejs-and-couchdb.html
Upgrade to Cradle 0.5.6. It does not have the problem.
Speculation about the problem
The waiting sockets are probably in the CLOSE_WAIT state. (There are other states that would match your grep
, such as TIME_WAIT
. Can you confirm that it is CLOSE_WAIT
and not anything else?)
The linked post has a helpful quote:
RF793 says CLOSE_WAIT is the TCP/IP stack waiting for the local application to release the socket. So, it hangs because it has received the information that the remote host has initiated a disconnection and is closing its socket, upon what the local application did not close its own side.
So maybe the solution consists in finding a bug fix for your application...
Indeed. In your case, there are two connections per query, one from JMeter to Node, and another from Node to CouchDB. Either JMeter (older more mature software) is not closing the connection properly, or Cradle (newer, less mature software) is not closing the connection properly. Obviously, Cradle is the most likely to have the bug. (Perhaps it is NodeJS's HTTP library itself, but Cradle seems like the first place to check.)
I do not have a complete answer, but hopefully these will be helpful clues. I think the address-in-use error is because there are no more source addresses to make an "outgoing" (even for 127.0.0.1) connection. But I am so far unsure why the CLOSE_WAIT count is different in each trial. (Perhaps it is fluctuating heavily as entire connection pools are closed.)
To gain more information, perhaps try an alternative CouchDB client library such as request or nano and compare the results.
Please us know what you find because it would be great to identify and close this potential Cradle bug (or bug somewhere at least!). Thanks.
精彩评论