开发者

Advice for use of honeypot img tag to detect scrapers / bad bots

We want to setup a little honeypot image in our html bodies to detect scrapers / bad bots.

Has anyone set something like this up before?

We were thinking the best way to go at it would be to:

a) Comment the html out via:

<!-- <img src="http://www.domain.com/honeypot.gif"/> -->

b) Apply css styles to the image that would make it hidden from browsers via:

.... id="honeypot" ....

#honeypot{
    display:none;
    visibility:hidden;
}

Using the above does anyone foresee any situations where a proper and real useragent would pull the image / attempt to render it?

The honeypot.gif would be a mod_rewritten php script where we would do our logging.

While I understand that the above 2 conditions might be skipped by any well coded scraper, it would at least shed some insight on the very dirty o开发者_如何学编程nes.

Any other pointers as to the best way to go at this?


A bot will ignore your img tag because it's within a comment.

Instead, you might consider creating an invisible div which contains a link to a trigger URL on the same site (preferably within the same directory, in case the bot is depth sensitive).


IMO I think any good scraper is going to know how to pass HTML using a SGML parser, and would just skip the commented image, but I could be wrong.

At most it will give you an idea when it happens, but doesn't provide a way to counter at scraper. You would probably be better off coming up with some kind of cookie based solution, as most bots probably don't care about these. You could also randomize image paths between requests and expire them after a short period.

Checking referrer is an obvious one, if you don't care about browsers that don't support them or people that hide/alter them.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜