Grabbing favicons with node.js
I've got a little program that needs to grab favicons from sites using node.js. This works in most cases, but with apple.com, I get errors that I can't understand or fix:
var sys= require('sys');
var fs= require('fs');
var http = require('http');
var rurl = http.createClient(80, 'apple.com');
var requestUrl = 'http://www.apple.com/favicon.ico';
var request = rurl.request('GET', requestUrl, {"Cont开发者_开发技巧ent-Type": "image/x-icon", "host" : "apple.com" });
request.end();
request.addListener('response', function (response)
{
response.setEncoding('binary');
var body = '';
response.addListener('data', function (chunk) {
body += chunk;
});
response.addListener("end", function() {
});
});
When I make this request, the response is:
<head><body> This object may be found <a HREF="http://www.apple.com/favicon.ico">here</a> </body>
As a result, I've modified the above code in just about every way possible with variants of the host name in the client creation step and the url request with 'www.apple.com', but typically I just get errors from node as follows:
node.js:63
throw e;
^
Error: Parse Error
at Client.ondata (http:901:22)
at IOWatcher.callback (net:494:29)
at node.js:769:9
Also, I'm not interested in using the google service to grab the favicons.
Your host setting in the request should be www.apple.com
(with www), and why are you including Content-Type header in the request? That makes no sense. Instead you should use Accept: image/x-icon
I got this response from that url:
$ curl -I http://www.apple.com/favicon.ico
HTTP/1.1 200 OK
Last-Modified: Thu, 12 Mar 2009 17:09:30 GMT
ETag: "1036-464ef0c1c8680"
Server: Apache/2.2.11 (Unix)
X-Cache-TTL: 568
X-Cached-Time: Thu, 21 Jan 2010 14:55:37 GMT
Accept-Ranges: bytes
Content-Length: 4150
Content-Type: image/x-icon
Cache-Control: max-age=463
Expires: Sun, 12 Sep 2010 14:22:09 GMT
Date: Sun, 12 Sep 2010 14:14:26 GMT
Connection: keep-alive
It shouldn't have any problems parsing that... and i got the icon data too.
Here's the response I got with the non-www host header:
$ curl -I http://www.apple.com/favicon.ico -H Host: apple.com
HTTP/1.0 400 Bad Request
Server: AkamaiGHost
Mime-Version: 1.0
Content-Type: text/html
Content-Length: 208
Expires: Sun, 12 Sep 2010 14:25:03 GMT
Date: Sun, 12 Sep 2010 14:25:03 GMT
Connection: close
HTTP/1.1 302 Object Moved
Location: http://www.apple.com/
Content-Type: text/html
Cache-Control: private
Connection: close
Incidentally both at once, meaning that their servers are working incorrectly, but that's another discussion.
This code seems to work for me
var sys = require("sys")
, fs = require("fs")
, http = require("http");
var client = http.createClient(80, "www.apple.com") // Change of hostname here and below
, req = client.request( "GET"
, "http://www.apple.com/favicon.ico"
, {"Host": "www.apple.com"});
req.end();
req.addListener("response", function (res) {
var body = "";
res.setEncoding('binary');
res.addListener("data", function (c) {
body += c;
});
res.addListener("end", function () {
// Do stuff with body
});
});
精彩评论