开发者

xml validation in php fails for sitemap file generated by xmlsitemap drupal module

I am using schemavalidate() from php to validate my sitemap.xml file. This sitemap.xml file is generated by drupal module(xmlsitemap). When I run the schemavalidate, I get errors. Here is the code,

libxml_use_internal_errors(false);

$xmlDom = new DomDocument('1.0', 'utf-8');
$xmlDom->validateOnParse = true;

if(!$xmlDom->load(xml file location - url))
{
    $errors = libxml_get_errors();
    libxml_clear_errors();
    $is_file_valid = FALSE;
}
else
{
    if (!$xmlDom->schemaValidate('http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd'))
    {
        $Errors = libxml_get_errors();
        $is_file_valid = FALSE;
        libxml_clear_errors();
    }
    else
    {
       $is_file_valid = TRUE;
    }
}

I see the following error, Element '{http://www.sitemaps.org/schemas/sitemap/0.9}lastmod': '2011-03-07T01:53Z' is not a valid va开发者_JS百科lue of the union type '{http://www.sitemaps.org/schemas/sitemap/0.9}tLastmod'

Let me know, If I am missing something with validation or do I have to workaround with this error.

Note: When I validate xml file online, I see no errors.

I have PHP Version 5.3.5. Regards.


From my understanding, there is something to do with the way xmlsitemap module creates a sitemap xml file. Basically the 'date' field in the xml file is not compatible with reference to the .xsd file that I am using 'http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd'.

When I modify the .xsd file, validation goes through. looking at this, I will taking this a known issue and let my site users know about this.


When I use xmlsitemap to generate a sitemap, I see entries like this:

<url><loc>{ok URL snipped}</loc><lastmod>2013-05-16T21:49Z</lastmod><changefreq>monthly</changefreq></url>
<url><loc>{different ok URL snipped}</loc><lastmod>2013-05-16T21:49Z</lastmod><changefreq>monthly</changefreq></url>

When I use schemaValidate on my example, I get the same errors you did. That led me to wonder how that schema defined an acceptable lastmod. From here: http://www.sitemaps.org/schemas/sitemap/0.9/ it looked like the union of date and dateTime for tLastmod might the key to the problem.

I found: http://www.w3.org/TR/xmlschema-2/ and browsed through, looking to see how those built-in data types were defined there. I noticed that the examples all showed seconds in the time.

I manually changed the sitemap.xml value to:

<url><loc>{ok URL snipped}</loc><lastmod>2013-05-16T21:49:00Z</lastmod><changefreq>monthly</changefreq></url>
<url><loc>{different ok URL snipped}</loc><lastmod>2013-05-16T21:49:00Z</lastmod><changefreq>monthly</changefreq></url>

and the XML validates.

So, I wonder if the missing seconds from the dateTime in output from xmlsitemap is causing the problems validating against the schema?

I see value in being able to take a site's sitemap.xml and making sure it validates (programmatically in PHP in my case) before trying to parse it.

I guess there's probably a more robust REST like service that could be passed the sitemap URL or a string representing the same and return a result for whether it validated, allowing for some fuzziness in dataTime format, etc., but schemaValidate looked like a promising first stab.

Edit:

You can get a sense of the discussion of validation issues for the module at:

https://drupal.org/project/issues/xmlsitemap?text=validation&status=All

Since I posted this answer, I've found that at least in the 7.x-2.0-rc2+0-dev version of the module I'm using (and possibly in earlier versions - I just haven't checked) I can configure Settings -> Advanced Settings -> Last modification date format -> Long to change the format that the modification dates are written in.

That's resulted in a sitemap that validates for a small set of examples I've used. I'm not sure that there aren't cases where the resulting sitemap might still not validate. For example, see the comment at:

https://drupal.org/node/1096282

which suggests to me there could be other non-validating situations.

If having the XML validate against the schema is important enough, perhaps it's worthy of a "xmlsitemap_validate.test", but there may not be enough interest in that validation to warrant that work...

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜