Why would connect() give intermittent EINVAL on port to FreeBSD?
I have in my C++ application a failure that arose upon porting to 32 bit FreeBSD 8.1 from 32 bit Linux. I have a TCP socket connection which fails to connect. In the call to connect(), I got an error result with errno == EINVAL which the man page for connect() does not cover.
What does this error mean, which argument is invalid? The message just says: "Invalid argument".
Here are some details of the connection:
family: AF_INET
len: 16
port: 2357
addr: 10.34.49.13
It doesn't always fail though. The FreeBSD version only fails after letting the machine sit idle for several hours. But after failing once, it works reliably until you let it sit idle again for a prolonged period.
Here is some of the code:
void setSocketOptions(const int skt);
void buildAddr(sockaddr_in &addr, const std::string &ip,
const ushort port);
void deepBind(const int skt, const sockaddr_in &addr);
void
test(const std::string &localHost, const std::string &remoteHost,
const ushort localPort, const ushort remotePort,
sockaddr_in &localTCPAddr, sockaddr_in &remoteTCPAddr)
{
const int skt = socket(AF_INET, SOCK_STREAM, 0);
if (0 > skt) {
clog << "Failed to create socket: (errno " << errno
<< ") " << strerror(errno) << endl;
throw;
}
setSocketOptions(skt);
// Build the localIp address and bind it to the feedback socket. Although
// it's not traditional for a client to bind the sending so开发者_运维百科cket to a the
// local address, we do it to prevent connect() from using an ephemeral port
// which (our site's firewall may block). Also build the remoteIp address.
buildAddr(localTCPAddr, localHost, localPort);
deepBind(skt, localTCPAddr);
buildAddr(remoteTCPAddr, remoteHost, remotePort);
clog << "Info: Command connect family: "
<< (remoteTCPAddr.sin_family == AF_INET ? "AF_INET" : "<unknown>")
<< " len: " << int(remoteTCPAddr.sin_len)
<< " port: " << ntohs(remoteTCPAddr.sin_port)
<< " addr: " << inet_ntoa(remoteTCPAddr.sin_addr) << endl;
if (0 > ::connect(skt, (sockaddr*)& remoteTCPAddr, sizeof(sockaddr_in)))) {
switch (errno) {
case EINVAL: {
int value = -1;
socklen_t len = sizeof(value);
getsockopt(skt, SOL_SOCKET, SO_ERROR, &value, &len);
cerr << "Error: Command connect failed on local port "
<< getLocFbPort()
<< " and remote port " << remotePort
<< " to remote host '" << remoteHost
<< "' family: "
<< (remoteTCPAddr.sin_family == AF_INET ? "AF_INET" : "<unknown>")
<< " len: " << int(remoteTCPAddr.sin_len)
<< " port: " << ntohs(remoteTCPAddr.sin_port)
<< " addr: " << inet_ntoa(remoteTCPAddr.sin_addr)
<< ": Invalid argument." << endl;
cerr << "\tgetsockopt => "
<< ((value != 0) ? strerror(value): "success") << endl;
throw;
}
default: {
cerr << "Error: Command connect failed on local port "
<< localPort << " and remote port " << remotePort
<< ": (errno " << errno << ") " << strerror(errno) << endl;
throw;
}
}
}
}
void
setSocketOptions(int skt)
{
// See page 192 of UNIX Network Programming: The Sockets Networking API
// Volume 1, Third Edition by W. Richard Stevens et. al. for info on using
// ::setsockopt().
// According to "Linux Socket Programming by Example" p. 319, we must call
// setsockopt w/ SO_REUSEADDR option BEFORE calling bind.
int so_reuseaddr = 1; // Enabled.
int reuseAddrResult
= ::setsockopt(skt, SOL_SOCKET, SO_REUSEADDR, &so_reuseaddr,
sizeof(so_reuseaddr));
if (reuseAddrResult != 0) {
cerr << "Failed to set reuse addr on socket.";
throw;
}
// For every two hours of inactivity, a keepalive occurs.
int so_keepalive = 1; // Enabled. See page 200 for info on SO_KEEPALIVE.
int keepAliveResult =
::setsockopt(skt, SOL_SOCKET, SO_KEEPALIVE, &so_keepalive,
sizeof(so_keepalive));
if (keepAliveResult != 0) {
cerr << "Failed to set keep alive on socket.";
throw;
}
struct linger so_linger;
so_linger.l_onoff = 1; // Turn linger option on.
so_linger.l_linger = 5; // Linger time in seconds. (See page 202)
int lingerResult
= ::setsockopt(skt, SOL_SOCKET, SO_LINGER, &so_linger,
sizeof(so_linger));
if (lingerResult != 0) {
cerr << "Failed to set linger on socket.";
throw;
}
// Disable the Nagel algorithm on the command channel. SOL_TCP is not
// defined on FreeBSD
#ifndef SOL_TCP
#define SOL_TCP (::getprotobyname("TCP")->p_proto)
#endif
unsigned int tcpNoDelay = 1;
int noDelayResult
= ::setsockopt(skt, SOL_TCP, TCP_NODELAY, &tcpNoDelay,
sizeof(tcpNoDelay));
if (noDelayResult != 0) {
cerr << "Failed to set tcp no delay on socket.";
throw;
}
}
void
buildAddr(sockaddr_in &addr, const std::string &ip, const ushort port)
{
memset(&addr, 0, sizeof(sockaddr_in)); // Clear all fields.
addr.sin_len = sizeof(sockaddr_in);
addr.sin_family = AF_INET; // Set the address family
addr.sin_port = htons(port); // Set the port.
if (0 == inet_aton(ip.c_str(), &addr.sin_addr)) {
cerr << "BuildAddr IP.";
throw;
}
};
void
deepBind(const int skt, const sockaddr_in &addr)
{
// Bind the requested port.
if (0 <= ::bind(skt, (sockaddr *)&addr, sizeof(addr))) {
return;
}
// If the port is already in use, wait up to 100 seconds.
int count = 0;
ushort port = ntohs(addr.sin_port);
while ((errno == EADDRINUSE) && (count < 10)) {
clog << "Waiting for port " << port << " to become available..."
<< endl;
::sleep(10);
++count;
if (0 <= ::bind(skt, (sockaddr*)&addr, sizeof(addr))) {
return;
}
}
cerr << "Error: failed to bind port.";
throw;
}
Here is example output when EINVAL (it doesn't always fail here, sometimes it succeeds and fails on the first packet sent over the socket getting scrambled):
Info: Command connect family: AF_INET len: 16 port: 2357 addr: 10.34.49.13
Error: Command connect failed on local port 2355 and remote port 2357 to remote host '10.34.49.13' family: AF_INET len: 16 port: 2357 addr: 10.34.49.13: Invalid argument.
getsockopt => success
I figured out what the issue was, I was first getting a ECONNREFUSED, which on Linux I can just retry the connect() after a short pause and all is well, but on FreeBSD, the following retry of connect() fails with EINVAL.
The solution is when ECONNREFUSED to back up further and instead start retrying back to beginning of test() definition above. With this change, the code now works properly.
It's interesting that the FreeBSD connect() manpage doesn't list EINVAL
. A different BSD manpage states:
[EINVAL] An invalid argument was detected (e.g., address_len is
not valid for the address family, the specified
address family is invalid).
Based on the disparate documentation from the different BSD flavours floating around, I would venture that there may be undocumented return code possibilities in FreeBSD, see here for example.
My advice is to print out your address length and the sizeof
and contents of your socket address structure before calling connect
- this will hopefully assist you to find out what's wrong.
Beyond that, it's probably best if you show us the code you use to set up the connection. This includes the type used for the socket address (struct sockaddr
, struct sockaddr_in
, etc), the code which initialises it, and the actual call to connect
. That'll make it a lot easier to assist.
What’s the local address? You’re silently ignoring errors from bind(2)
, which seems like not only bad form, but could be causing this issue to begin with!
精彩评论