开发者

Why do I not see MSG_EOR for SOCK_SEQPACKET on linux?

I have two processes which are communicating over a pair of sockets created with socketpair() and SOCK_SEQPACKET. Like this:

int ipc_sockets[2];
socketpair(PF_LOCAL, SOCK_SEQPACKET, 0, ipc_sockets);

As I understand it, I should see MSG_EOR in the msg_flags member of "struct msghdr" when receiving a SOCK_SEQPACKET record. I am setting MSG_EOR in sendmsg() to be certain that the record is marked MSG_EOR, but I do not see it when receiving in recvmsg(). I've even tried to set MSG_EOR in the msg_flags field before sending the record, but that made no difference at all.

I think I should see MSG_EOR unless the record was cut short by, e.g. a signal, but I do not. Why is that?

I've pasted my sending and receiving code in below.

Thanks, jules

int
send_fd(int fd,
        void *data,
        const uint32_t len,
        int fd_to_send,
        uint32_t * const bytes_sent)
{
    ssize_t n;
    struct msghdr msg;
    struct iovec iov;

    memset(&msg, 0, sizeof(struct msghdr));
    memset(&iov, 0, sizeof(struct iovec));

#ifdef HAVE_MSGHDR_MSG_CONTROL
    union {
        struct cmsghdr cm;
        char control[CMSG_SPACE_SIZEOF_INT];
    } control_un;
    struct cmsghdr *cmptr;

    msg.msg_control = control_un.control;
    msg.msg_controllen = sizeof(control_un.control);
    memset(msg.msg_control, 0, sizeof(control_un.control));

    cmptr = CMSG_FIRSTHDR(&msg);
    cmptr->cmsg_len = CMSG_LEN(sizeof(int));
    cmptr->cmsg_level = SOL_SOCKET;
    cmptr->cmsg_type = SCM_RIGHTS;
    *((int *) CMSG_DATA(cmptr)) = fd_to_send;
#else
    msg.msg_accrights = (caddr_t) &fd_to_send;
    msg.msg_accrightslen = sizeof(int);
#endif
    msg.msg_name = NULL;
    msg.msg_namelen = 0;

    iov.iov_base = data;
    iov.iov_len = len;
    msg.msg_iov = &iov;
    msg.msg_iovlen = 1;

#ifdef __linux__
    msg.msg_flags = MSG_EOR;
    n = sendmsg(fd, &msg, MSG_EOR);
#elif defined __APPLE__
    n = sendmsg(fd, &msg, 0); /* MSG_EOR is not supported on Mac                                                                                                                                                                        
                               * OS X due to lack of                                                                                                                                                                                    
                               * SOCK_SEQPACKET support on                                                                                                                                                                              
                               * socketpair() */
#endif
    switch (n) {
    case EMSGSIZE:
        return开发者_StackOverflow中文版 EMSGSIZE;
    case -1:
        return 1;
    default:
        *bytes_sent = n;
    }

    return 0;
}

int
recv_fd(int fd,
        void *buf,
        const uint32_t len,
        int *recvfd,
        uint32_t * const bytes_recv)
{
    struct msghdr msg;
    struct iovec iov;
    ssize_t n = 0;
#ifndef HAVE_MSGHDR_MSG_CONTROL
    int newfd;
#endif
    memset(&msg, 0, sizeof(struct msghdr));
    memset(&iov, 0, sizeof(struct iovec));

#ifdef HAVE_MSGHDR_MSG_CONTROL
    union {
        struct cmsghdr  cm;
        char control[CMSG_SPACE_SIZEOF_INT];
    } control_un;
    struct cmsghdr *cmptr;

    msg.msg_control = control_un.control;
    msg.msg_controllen = sizeof(control_un.control);
    memset(msg.msg_control, 0, sizeof(control_un.control));
#else
    msg.msg_accrights = (caddr_t) &newfd;
    msg.msg_accrightslen = sizeof(int);
#endif
    msg.msg_name = NULL;
    msg.msg_namelen = 0;

    iov.iov_base = buf;
    iov.iov_len = len;
    msg.msg_iov = &iov;
    msg.msg_iovlen = 1;

    if (recvfd)
        *recvfd = -1;

    n = recvmsg(fd, &msg, 0);
    if (msg.msg_flags) { // <== I should see MSG_EOR here if the entire record was received
        return 1;
    }
    if (bytes_recv)
        *bytes_recv = n;
    switch (n) {
    case 0:
        *bytes_recv = 0;
        return 0;
    case -1:
        return 1;
    default:
        break;
    }

#ifdef HAVE_MSGHDR_MSG_CONTROL
    if ((NULL != (cmptr = CMSG_FIRSTHDR(&msg))) 
        && cmptr->cmsg_len == CMSG_LEN(sizeof(int))) {
        if (SOL_SOCKET != cmptr->cmsg_level) {
            return 0;
        }
        if (SCM_RIGHTS != cmptr->cmsg_type) {
            return 0;
        }
        if (recvfd)
            *recvfd = *((int *) CMSG_DATA(cmptr));
    }
#else
    if (recvfd && (sizeof(int) == msg.msg_accrightslen))
        *recvfd = newfd;
#endif
    return 0;
}


With SOCK_SEQPACKET unix domain sockets the only way for the message to be cut short is if the buffer you give to recvmsg() isn't big enough (and in that case you'll get MSG_TRUNC).

POSIX says that SOCK_SEQPACKET sockets must set MSG_EOR at the end of a record, but Linux unix domain sockets don't.

(Refs: POSIX 2008 2.10.10 says SOCK_SEQPACKET must support records, and 2.10.6 says record boundaries are visible to the receiver via the MSG_EOR flag.)

What a 'record' means for a given protocol is up to the implementation to define.

If Linux did implement MSG_EOR for unix domain sockets, I think the only sensible way would be to say that each packet was a record in itself, and so always set MSG_EOR (or maybe always set it when not setting MSG_TRUNC), so it wouldn't be informative anyway.


That's not what MSG_EOR is for.

Remember that the sockets API is an abstraction over a number of different protocols, including UNIX filesystem sockets, socketpairs, TCP, UDP, and many many different network protocols, including X.25 and some entirely forgotten ones.

MSG_EOR is to signal end of record where that makes sense for the underlying protocol. I.e. it is to pass a message to the next layer down that "this completes a record". This may affect for example, buffering, causing the flushing of a buffer. But if the protocol itself doesn't have a concept of a "record" there is no reason to expect the flag to be propagated.

Secondly, if using SEQPACKET you must read the entire message at once. If you do not the remainder will be discarded. That's documented. In particular, MSG_EOR is not a flag to tell you that this is the last part of the packet.

Advice: You are obviously writing a non-SEQPACKET version for use on MacOS. I suggest you dump the SEQPACKET version as it is only going to double the maintenance and coding burden. SOCK_STREAM is fine for all platforms.


When you read the docs, SOCK_SEQPACKET differs from SOCK_STREAM in two distinct ways. Firstly -

Sequenced, reliable, two-way connection-based data transmission path for datagrams of fixed maximum length; a consumer is required to read an entire packet with each input system call.

-- socket(2) from Linux manpages project

aka

For message-based sockets, such as SOCK_DGRAM and SOCK_SEQPACKET, the entire message shall be read in a single operation. If a message is too long to fit in the supplied buffers, and MSG_PEEK is not set in the flags argument, the excess bytes shall be discarded, and MSG_TRUNC shall be set in the msg_flags member of the msghdr structure.

-- recvmsg() in POSIX standard.

In this sense it is similar to SOCK_DGRAM.

Secondly each "datagram" (Linux) / "message" (POSIX) carries a flag called MSG_EOR.

However Linux SOCK_SEQPACKET for AF_UNIX does not implement MSG_EOR. The current docs do not match reality :-)


Allegedly some SOCK_SEQPACKET implementations do the other one. And some implement both. So that covers all the possible different combinations :-)

[1] Packet oriented protocols generally use packet level reads with truncation / discard semantics and no MSG_EOR. X.25, Bluetooth, IRDA, and Unix domain sockets use SOCK_SEQPACKET this way.

[2] Record oriented protocols generally use byte stream reads and MSG_EOR

  • no packet level visibility, no truncation / discard. DECNet and ISO TP use SOCK_SEQPACKET that way.

[3] Packet / record hybrids generally use SOCK_SEQPACKET with truncation / discard semantics on the packet level, and record terminating packets marked with MSG_EOR. SPX and XNS SPP use SOCK_SEQPACKET this way.

https://mailarchive.ietf.org/arch/msg/tsvwg/9pDzBOG1KQDzQ2wAul5vnAjrRkA

You've shown an example of paragraph 1.

Paragraph 2 also applies to SOCK_SEQPACKET as defined for SCTP. Although by default it sets MSG_EOR on every sendmsg(). The option to disable this is called SCTP_EXPLICIT_EOR.

Paragraph 3, the one most consistent with the docs, seems to be the most obscure case.

And even the docs are not properly consistent with themselves.

The SOCK_SEQPACKET socket type is similar to the SOCK_STREAM type, and is also connection-oriented. The only difference between these types is that record boundaries are maintained using the SOCK_SEQPACKET type. A record can be sent using one or more output operations and received using one or more input operations, but a single operation never transfers parts of more than one record. Record boundaries are visible to the receiver via the MSG_EOR flag in the received message flags returned by the recvmsg() function. -- POSIX standard

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜