Website content crawling

2022-12-30 18:45 问答作者：

We have a Business Listings directory hosted on IIS 6 Windows 2003. Our competitors crawl and steal our content 开发者_JAVA百科and customers.

We have tried IP blocking using honeypot URLs and log parsing without much success. Is anyone aware of a network device or a proxy server that I can run in front of my web server to minimize this issue?

All suggestions are highly appreciated.

You could try a spider trap, but they could add a check for that.

You could also add a rate limiter, and after a certain rate force them to solve a CAPTCHA, but you might also annoy your regular users.

But really, anything you create they can probably adapt and work around. Your best be might just be what Developer Art said, and get a lawyer.

If there are many pages of data, you can monitor the IPs of visitors and make sure a given IP sees no more than a fraction of your pages per day.

Ultimately what you want is a contradiction: you do want people to download it to their computers (to view it now); but you don't want them to download it to their computers (to view it later).

继续阅读：bots screen-scraping

Website content crawling

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？