Mod rewrite rule for cached pages
I am caching pages in my (Rails) application based on subdomain. The pages for certain actions are cached to /public/cache/(subdomain)/. The application is running under Apache with Phusion Passenger. The caching is working fine. The problem is that Apache is not picking up the cached pages and bypassing Rails like it should be. My rewrite rules are wrong and I need help fixing them.
I have used, as one example of many, the suggestion located at: https://github.com/yeah/page_cache_fu#readme, which is as follows:
RewriteMap uri_escape int:escape
<Directory /var/www/example.com/current/public>
RewriteEngine On
RewriteCond %{REQUEST_METHOD} GET [NC]
RewriteCond %{DOCUMENT_ROOT}/cache/%{HTTP_HOST}%{REQUEST_URI}%{QUERY_STRING}.html -f
RewriteRule ^([^.]+)$ cache/%{HTTP_HOST}/$1${uri_escape:%{QUERY_STRING}}.html [L]
RewriteCond %{REQUEST_METHOD} GET [NC]
RewriteCond %{DOCUMENT_ROOT}/cache/%{HTTP_HOST}/index.html -f
RewriteRule ^$ cache/%{HTTP_HOST}/index.html
The problem with this is it seems to be expecting the directory to be the full http host (i.e. it's looking in cache/subdomain.example.com rather than just cache/subdomain).
Edit: Even when I change the Rails app to cache to cache/subdomain.example.com Apache still does not use them so it seems that there is more wrong than just the subdomain aspect.
Could someone please help me come up with the correct rule?
Edit(2):
I have simplified my rewrite to the following (just to try to get to a working starting point):
RewriteEngine On
RewriteCond %{HTTP_HOST} ^([^.]+)\.example\.com$ [NC]
RewriteCond ^stats$ cache/%1/stats.html [L]
I would think this would cause http://abc.example.com/stats to be rewritten to http://abc.example.com/cache/abc/stats.html
It is not. I also added a RewriteLog entry and what I see there makes me think it is trying to redirect to http://abc.example.com/var/www/example.com/current/public/cache/abc/stats.html. This is further confirmed by that if I add an开发者_JAVA技巧 'R' option along with the 'L' I see in my browser http://abc.example.com/var/www/....etc. I.e. it seems to be appending the full document root instead of just the public facing part.
Of course the result of the above is that I get a 404 error returned to the browser.
Can you see what is still wrong with my rule?
Edit: It's actually a bug.
http://code.google.com/p/phusion-passenger/issues/detail?id=563
Alright, this looks like it should work, but it doesn't. I've done a lot of testing with this, and it seems like the problem is the ^([^.]+)$
in the RewriteRule. Now, I did Google this, and it seems like it's a common enough pattern, so I don't understand what the issue could be. I just know that when I use that pattern in a RewriteRule, the rule fails. If I change it to ^([^.]+)
, it seems to work.
Hopefully someone with more experience with mod_rewrite can come along and explain to us what the problem with that pattern might be.
Edit: I just realized the problem with ^([^.]+)$
:
Since you're building a cache, then the "normal" file will exist in its usual place. The implication of this is that if you ask the server for /file
then, depending on your configuration, it will say "hey, file
doesn't exist, so let's try the default extension of .html
!" and so it goes off and finds file.html
. Now when you get to the RewriteRule, the ^([^.]+)$
regex will be matched against file.html
NOT file
.
The ^([^.]+)$
says "the start of the string, followed by as many non-period characters as you can grab, followed by the end of the string" which works fine against file
because it contains no periods. It fails against file.html
because ^[^.]+
will match against file
, but where the regex then expects to find the end of the string (i.e. $
), it instead finds .html
and fails.
The reason ^(.*)$
works is that it's guaranteed that only .*
will be the whole of the string, since .*
matches "as many of any character" so there is no character that can possibly exist between the (.*)
and $
portions of the regex. That's not the case with [^.]+
.
In order to extract the subdomain, you're going to need to backreference a RewriteCond. Basically, if you capture a reference (i.e. encapsulate something inside parens) in a RewriteCond, those references are available to a RewriteRule which immediately follows it.
For example, if I wrote this:
RewriteCond %{HTTP_HOST} ^([^.]+)\.example.com
Then the parentheses would capture the subdomain - note the ()
around [^.]+
If I were then to write a RewriteRule on the next line, the text captured above would become accessible as %1
.
So your RewriteRule would look like this:
RewriteRule ^([^.]+) cache/%1/$1${uri_escape:%{QUERY_STRING}}.html [L]
Hope that helps.
精彩评论