Website content crawling
We have a Business Listings directory hosted on IIS 6 Windows 2003. Our competitors crawl and steal our content 开发者_JAVA百科and customers.
We have tried IP blocking using honeypot URLs and log parsing without much success. Is anyone aware of a network device or a proxy server that I can run in front of my web server to minimize this issue?
All suggestions are highly appreciated.
You could try a spider trap, but they could add a check for that.
You could also add a rate limiter, and after a certain rate force them to solve a CAPTCHA, but you might also annoy your regular users.
But really, anything you create they can probably adapt and work around. Your best be might just be what Developer Art said, and get a lawyer.
If there are many pages of data, you can monitor the IPs of visitors and make sure a given IP sees no more than a fraction of your pages per day.
Ultimately what you want is a contradiction: you do want people to download it to their computers (to view it now); but you don't want them to download it to their computers (to view it later).
精彩评论