How to determine real user are browsing my site or just crawling or else in PHP

2023-03-18 00:17 问答作者：

I want to know whether a user are actually looking my site(I know it's just load by the browser and display to human, not actually human looking at it).

I know two method will work.

Javascript.

If the page was load by the browser, 开发者_Go百科it will run the js code automatically, except forbid by the browser. Then use AJAX to call back the server.
1×1 transparent image of in the html.

Use img to call back the server.

Do anyone know the pitfall of these method or any better method?

Also, I don't know how to determine a 0×0 or 1×1 iframe to prevent the above method.

A bot can access a browser, e.g. http://browsershots.org
The bot can request that 1x1 image.

In short, there is no real way to tell. Best you could do is use a CAPTCHA, but then it degrades the experience for humans.

Just use a CAPTCHA where required (user sign up, etc).

I want to know whether a user are actually looking my site(I know it's just load by the browser and display to human, not actually human looking at it).

The image way seems better, as Javascript might be turned off by normal users as well. Robots generally don't load images, so this should indeed work. Nonetheless, if you're just looking to filter a known set of robots (say Google and Yahoo), you can simply check for the HTTP User Agent header, as those robots will actually identify themselves as being a robot.

you can create an google webmasters account and it tells you how to configure your site for bots also show how robot will read your website

I agree with others here, this is really tough - generally nice crawlers will identify themselves as crawlers so using the User-Agent is a pretty good way to filter out those guys. A good source for user agent strings can be found at http://www.useragentstring.com. I've used Chris Schulds php script (http://chrisschuld.com/projects/browser-php-detecting-a-users-browser-from-php/) to good effect in the past.

You can also filter these guys at the server level using the Apache config or .htaccess file, but I've found that to be a losing battle keeping up with it.

However, if you watch your server logs you'll see lots of suspect activity with valid (browser) user-agents or funky user-agents so this will only work so far. You can play the blacklist/whitelist IP game, but that will get old fast.

Lots of crawlers do load images (i.e. Google image search), so I don't think that will work all the time.

Very few crawlers will have Javascript engines, so that is probably a good way to differentiate them. And lets face it, how many users actually turn of Javascript these days? I've seen the stats on that, but I think those stats are very skewed by the sheer number of crawlers/bots out there that don't identify themselves. However, a caveat is that I have seen that the Google bot does run Javascript now.

So, bottom line, its tough. I'd go with a hybrid strategy for sure - if you filter using user-agent, images, IP and javascript I'm sure you'll get most bots, but expect some to get through despite that.

Another idea, you could always use a known Javascript browser quirk to test if the reported user-agent (if its a browser) is really actually that browser?

"Nice" robots like those from google or yahoo will usually respect a robots.txt file. Filtering by useragent might also help.

But in the end - if someone wants to gain automated access it will be very hard to prevent that; you should be sure it is worth the effort.

Inspect the User-Agent header of the http request. Crawlers should set this to anything but a known browser.

here are the google-bot header http://code.google.com/intl/nl-NL/web/controlcrawlindex/docs/crawlers.html

In php you can get the user-agent with :

$Uagent=$_SERVER['HTTP_USER_AGENT'];

Then you just compare it with the known headers as a tip preg_match() could be handy to do this all in a few lines of code.

继续阅读：iframe javascript php

How to determine real user are browsing my site or just crawling or else in PHP

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？