开发者

Prevent robots from coping images from a website

I have a website created in PHP technology and i want to prevent robots from coping images from the website.What is the best way to prevent robots from coping images from a website?

P开发者_开发知识库lease make sure it doesn't harm SEO. Please make sure this does not influence spyders and crawlers from indexing the site.


As others have said, first tell bots they can't access the images with robots.txt if possible. Well-behaved bots will obey that.

Do a search for "prevent hotlinking". The standard method is to block requests to image files without a referrer within your domain, using a mod_rewrite rule. That'll stop most bots.

You can match the user-agent strings of hundreds of common crawlers using get_browser and a recent browscap.ini file. This is not commonly available on shared hosting, but if you read the comments in the manual, you should find a get_browsers implementation that you can run from your own code.

All of these will effect SEO since the major search engines all have image searches. It'll also effect the new Google Web Previews that show a screenshot of the webpage when hovering over a search result, since you'll be blocking the bot from seeing the images on your page when creating the screenshot.


You can configure your robots.txt to allow certain robots, but not others:

E.g.:

User-Agent: *
Disallow: /images

User-Agent: Googlebot-Image
Disallow: 

This is just an example. You can also allow other well-behaved robots.

But that does nothing about badly behaved robots that just ignore robots.txt. there's really no solution for them, though authentication can help a little (you can throttle image access by account).


Not sure if it'd work, but if you have all your images in /images/ folder maybe you could set

User-agent: *
Disallow: /images/


Some potential solutions might include using Flash to display the images or loading them dynamically via Javascript after the page has loaded. You could also consider throttling page loads by IP to prevent extremely fast access, making the robot much slower in scraping the site. These solutions have obvious drawbacks, though.

There is no failsafe method of preventing content scraping on your website. A competent developer who wants to scrape a site he has access to can do so with little effort. The best bet is to watermark the content or put it behind a pay wall.


difficult, no fool proof way to do it, you can try to make it more difficult for the bots, though.

what comes to mind at the moment are:

  • create the image links using javascript (will force bots to execute the javascripts on the page)

  • use css sprites (i.e. pack several images together into one image), which might make it less useful for the bots (e.g. if they just wanna harvest and redisplay the images on their page, it will look a bit ugly on their site when several images are packed into one)

  • check http_referer and only serve proper images when http_referer is an allowed domain

  • put a water mark on top of the image with your domain name, makes it less useful for other sites

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜