How to block bot requests to URLs that match a common pattern in Apache?
I've got an apache server that gets hit about 100 times at once every 30 minutes with requests for URLs that match this pattern:
/neighborhood/****/feed
These URLs used to have content on them and used to be valid. Now they are all 404 so this bot is killing performance every time it hits us.
What do I add to my htaccess file to block it?
Note: The bot is on EC2 so blocking by IP a开发者_如何学编程ddress won't work. I need to block requests that match that pattern.
Using a mod_rewrite rule should get you to where you want to be:
RewriteEngine On
RewriteCond %{REQUEST_URI} ^/neighborhood/[^/]+/feed$ [NC]
RewriteRule ^.*$ - [F,L]
The above goes into your .htaccess file or if you'd prefer to put it within your vhost file (because you've turned off .htaccess parsing for performance -- a good idea):
<Location />
RewriteEngine On
RewriteCond %{REQUEST_URI} ^/neighborhood/[^/]+/feed$ [NC]
RewriteRule ^.*$ - [F,L]
</Location>
Given a URI of /neighborhood/carson/feed you should expect a response such as:
Forbidden
You don't have permission to access /neighborhood/carson/feed on this server.
Apache/2.2.16 (Ubuntu) Server at ... Port 80
This was tested on my local VM running Apache/2.2.16 on Ubuntu 10.10.
The following code could be used for 404 in mod_rewrite:
RewriteRule pattern - [R=404] [other_flags]
Put a caching system or CDN in front of Apache, and allow your 404 responses to be cached.
403's can easily be set by mod_rewrite:
RewriteRule ^neighborhood/[^/]+/feed$ - [F]
The above answers block all users, including regular users. I think another condition should be included to limit only bots:
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^.*(spider|HTTrack|Yandex|muckrack|bot).*$ [NC]
RewriteCond %{REQUEST_URI} ^/neighborhood/[^/]+/feed$ [NC]
RewriteRule ^.*$ - [F,L]
mod_rewrite? But I doubt it could be made faster on apache level. I would take a look at nginx as a frontend, it is way more efficiant at both 404 and rules performance :-)
PS. Also, you may try to return a redirect to 100Mb file somewhere to make some fun of these bots :-D
精彩评论