开发者

Why is separate getaddrinfo-like() + connect() not refactored into a (theoretical) connect_by_name()?

Most of the applications I've seen that use TCP, do roughly the following to connect to remote host:

  1. get the hostname (or address) from the configuration/user input (textual)
  2. either resolve the hostname into address and add the port, or use getaddrinfo()
  3. from the above fill in the sockaddr_* structure with one of the remote addresses
  4. use the connect() to get the socket connected to the remote host.
  5. if fails, possibly go to (3) and retry - or just complain about the error

(2) is blocking in the stock library implementation, and the (4) seems to be most frequently non-blocking, which seems to give a room for a lot of somewhat similar yet different code that serves the purpose to asynchronously connect to a remote host by its hostname.

So the 开发者_开发知识库question: what are the good reasons not to have the additional single call like following:

int sockfd = connect_by_name(const char *hostname, const char *servicename)

?

I can come up with three:

  • historic: because that's what the API is
  • provide for custom per-application policy mechanism for address selection/connection retry: this seems a bit superficial, since for the common case ("get me a tube to talk to remote host") the underlying OS should know better
  • provide the visual feedback to the user about the exact step involved ("name resolution" vs "connection attempt"): this seems rather important, lookup+connection attempt may take time

Only the last of them seems to be compelling enough to rewrite the resolve/connect code for every client app (as opposed to at least having and using a widely used library that would implement the connect_by_name() semantics in addition to the existing sockets API), so surely there should be some more reasons that I am missing ?

(one of the reasons behind the question is that this kind of API would appear to help the portability to IPv6, as well as possibly to other stream transport protocols significantly)

Or, maybe such a library exists and my google-fu failed me ?

(edited: corrected the definition to look like it was meant to look, thanks LnxPrgr3)


Implementing such an API with non-blocking characteristics within the constraints of the standard library (which, crucially, isn't supposed to start its own threads or processes to work asynchronously) would be problematic.

Both the name lookup and connecting part of the process require waiting for a remote response. If either of these are not to block, then that requires a way of doing asychronous work and signalling the change in state of the socket to the calling application. connect is able to do this, because the work of the connect call is done in the kernel, and the kernel can mark the socket as readable when the connect is done. However, name lookup is not able to do this, because the work of a name lookup is done in userspace - and without starting a new thread (which is verboten in the standard library), giving that name lookup code a way to be woken up to continue work is a difficult problem.

You could do it by having your proposed call return two file descriptors - one for the socket itself, and another that you are told "Do nothing with this file descriptor except to check regularly if it is readable. If this file descriptor becomes readable, call cbn_do_some_more_work(fd)". That is clearly a fairly uninspiring API!

The usual UNIX approach is to provide a set of simple, flexible tools, working on a small set of object types, that can be combined in order to produce complex effects. That applies to the programming API as much as it does to the standard shell tools.


Because you can build higher level APIs such as the one you propose on top of the native low level APIs.

The socket API is not just for TCP, but can also be used for other protocols that may have different end point conventions (i.e. the Unix-local protocol where you have a name only and no service). Or consider DNS which uses sockets to implement itself. How does the DNS code connect to the server if the connection code relies on DNS?

If you would like a higher level abstraction, one library to check out is ACE.


There are several questions in your question. For instance, why not standardizing an API with such connect_by_name? That would certainly be a good idea. It would not fit every purpose (see the DNS example from R Samuel Klatchko) but for the typical network program, it would be OK. A paper exploring such APIs is "Simplifying Internet Applications Development With A Name-Oriented Sockets Interface" by Christian Vogt. Note that another difficulty for such an API would be "callback" applications, for instance a SIP client asking to be called back: the application has no easy way to know its own name and therefore often prefer to be called back by address, despite the problems it make, for instance with NAT.

Now, another question is "Is it possible to build such connect_by_name subroutine today?" Partly yes (with the caveats mentioned by caf) but, if written in userspace, in an ordinary library, it would not be completely "name-oriented" since the Unix kernel still manages the connections using IP addresses. For instance, I would expect a "real" connect_by_name routine to be able to survive renumbering (for instance because a mobile host renumbered), which is quite difficult to do in userspace.

Finally, yes, it already exists a lot of libraries with similar semantics. For a HTTP client (the most common case for a program whose network abilities are not the main feature, for instance a XML processor), you have Neon and libcURL. With libcURL, you can simply write things like:

#define URL "http://www.velib.paris.fr/service/stationdetails/42"
...
curl_easy_setopt(curl, CURLOPT_URL, URL);
result = curl_easy_perform(curl);

which is even higher-layer than connect_by_name since it uses an URL, not a domain name.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜