Google's Indexing XSLT Pages

2023-02-13 08:21 问答作者：

My site has been created with a开发者_如何学Cn XML as a data store, and XSLT used as a template. It appears that Google is not very good on indexing sites that are XML/XSLT based. Are there any efficient/easy to implement software components that can render the XSLT just for the Google bot indexer? It would be even better if they worked with PHP.

Take a look at the PHP XSLT processor.

http://php.net/manual/en/class.xsltprocessor.php

Use as follows:

<?php 
$sXml  = "<xml>"; 
$sXml .= "<sudhir>hello sudhir</sudhir>"; 
$sXml .= "</xml>"; 

# LOAD XML FILE 
$XML = new DOMDocument(); 
$XML->loadXML( $sXml ); 

# START XSLT 
$xslt = new XSLTProcessor(); 
$XSL = new DOMDocument(); 
$XSL->load( 'xsl/index.xsl', LIBXML_NOCDATA); 
$xslt->importStylesheet( $XSL ); 
#PRINT 
print $xslt->transformToXML( $XML ); 
?>

(From http://php.net/manual/en/book.xsl.php)

UPDATE

You asked in the comment how to intercept a request from a specific user agent (eg. the Googlebot). There are various ways to do this, depending on the web server technology you are using.

On Apache, one method would be to use mod_rewrite to internally divert the processing of the request to a PHP script containing code similar to what we see above. This script retrieves the XML from the originally requested URL and renders the transformation to the client. The rewrite rule would have a Rewrite Condition that compares the HTTP_USER_AGENT header to Google's. Here is an example of the rule (untested, but you should get the idea):

RewriteCond %{HTTP_USER_AGENT} ^(.*)Googlebot(.*)$ [NC]
RewriteRule ^(.*\.xml.*)$ /renderxslt.php?url=$1 [L]

Briefly, the condition is looking for a referrer starting with the string "googlebot" and the rewrite rule is matching any URL with the string ".xml" in it, and passing the full URL to the renderxslt.php page as a querystring parameter.

A port of mod_rewrite exis for IIS too (http://www.isapirewrite.com/).

Alternatively, with IIS you could use an ASP.NET HTTP module to intercept the request, again checking Request.Headers["HTTP_USER_AGENT"] for Google's signature. You can then proceed in a similar manner to above by reading the HTML generated by your PHP script, or altenatively by using the ASP.NET XML control:

<asp:Xml ID="Xml1" runat="server" DocumentSource="~/cdlist.xml" TransformSource="~/listformat.xsl"></asp:Xml>

Why not just exclude the directory that holds your xsl files in your robots.txt?

继续阅读：php seo xml xslt

Google's Indexing XSLT Pages

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？