Do bots' user agents always have "http" in it?
Is it safe to assume that all bots' user agents always have URLs in the user agent strings? I, of course, compare the user agent against the list of bots, but the idea here to do a preliminary check before check it against a long list.
Perhaps if I could reword my question better, is there any valid non-bot, non-crawler, non-spider or any non-filthy creatu开发者_Go百科re that has a URL in the user agent?
Is it safe to assume that all bots' user agents always have URLs in the user agent strings?
Nope. Check out this bot list, it has plenty of bots that don't sport a URL.
Perhaps if I could reword my question better, is there any valid non-bot, non-crawler, non-spider or any non-filthy creature that has a URL in the user agent?
I can't think of a browser that has a URL in the agent string, but this is definitely a dangerous assumption to work with. Remember that for example, Internet Explorer Add-Ons can add their signatures to the browser's user agent string. You can't guarantee there won't be a URL in it.
There's no assumptions you can make about the user agent string. RFC 1945, section 10.15 User Agent
specifies the format and the section 3.7 Product Token
specifies how product tokens should be formatted. As you can see from these two, user agent string can be pretty much anything the HTTP agent wants it to be.
Note: strictly speaking, using an URL in the product token can be treated as a violation of that RFC, since the /
should be treated as a separator between the product identifier and the product version.
精彩评论