DOM Error - ID 'someAnchor' already defined in Entity, line X
If I try to load an HTML document into PHP DOM I get an error along the lines of:
Error DOMDocument::loadHTML() [domdocument.loadhtml]: ID someAnchor already defined in Entity, line: 9
I cannot work out why. Here is some code that loads an HTML string into DOM.
First without containing an anchor tag and second with one. The second document produces an error.
Hopefully you should be able to cut and paste it into a script and run it to see the same output:
<?php
ini_set('display_errors', 1);
error_reporting(E_ALL);
$stringWithNoAnchor = <<<EOT
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>My document</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
</head>
<body >
<h1>Hello</h1>
</body>
</html>
EOT;
$stringWithAnchor = <<<EOT
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>My document</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
</head>
<body >
<h1>Hello</h1>
<a name="someAnchor" id="someAnchor"></a>
</body>
</html>
EOT;
class domGrabber
{
public $_FileErrorStr = '';
/**
*@desc DOM object factory does the work of loading the DOM object
*/
public function getLoadAsDOMObj($htmlString)
{
$this->_FileErrorStr =''; //reset error container
$xmlDoc = new DOMDocument();
set_error_handler(array($this, '_FileErrorHandler')); // Warnings and errors are suppressed
$xmlDoc->loadHTML($htmlString);
restore_error_handler();
return $xmlDoc;
}
/**
*@desc public so that it can catch errors from outside this class
*/
public function _FileErrorHandler($errno, $errstr, $errfile, $errline)
{
if ($this->_FileErrorStr === null)
{
$this->_FileErrorStr = $errstr;
}
else {
$this->_FileErrorStr .= (PHP_EOL . $errstr);
}
}
}
$domGrabber = new domGrabber();
$xmlDoc = $domGrabber->getLoadAsDOMObj($stringWithNoAnchor );
echo 'PHP Version: '. phpversion() .'<br />'."\n";
echo '<pre>';
print $xmlDoc->saveXML();
echo '</pre>'."\n";
if ($domGrabber->_FileErrorStr)
{
echo 'Error'. $domGrabber->_FileErrorStr;
}
$xmlDoc = $domGrabber->getLoadAsDOMObj($stringWithAnchor);
echo '<pre>';
print $xmlDoc->saveXML();
echo '</pre>'."\n";
if ($domGrabber->_FileErrorStr)
{
echo 'Error'. $domGrabber->_FileErrorStr;
}
I get the following out put in my Firefox source code view:
PHP Version: 5.2.9<br />
<pre><?xml version="1.0" encoding="iso-8859-1" standalone="yes"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xmlns="http://www.w3.org/1999/xhtml"><head><title>My document</title><meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" /></head><body>
<h1>Hello</h1>
</body></html>
</pre>
<pre><?xml version="1.0" encoding="iso-8859-1" standalone="yes"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xmlns="http://www.w3.org/1999/xhtml"><head><title>My document</title><meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" /></head><body>
<h1>Hello</h1>
<a name="someAnchor" id="someAnchor"></a>
</body></html>
</pre>
Error
DOMDocument::loadHTML() [<a href='domdocument.loadhtml'>domdocument.loadhtml</a>]: ID someAnchor already defined in Entity, line: 9
So, why is DOM saying that someAnchor is already defined?
Update:
I experimented with both
- Instead of using loadHTML() I used the loadXML() method - and that fixed it
- Instead of having both id and name I used just id - Attribute and that fixed it.
See the comparison script here for the sake of completion:
<?php
ini_set('display_errors', 1);
error_reporting(E_ALL);
$stringWithNoAnchor = <<<EOT
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>My document</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
</head>
<body >
<p>stringWithNoAnchor</p>
</body>
</html>
EOT;
$stringWithAnchor = <<<EOT
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>My document</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
</head>
<body >
<p>stringWithAnchor</p>
<a name="someAnchor" id="someAnchor" ></a>
</body>
</html>
EOT;
$stringWithAnchorButOnlyIdAtt = <<<EOT
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>My document</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
</head>
<body >
<p>stringWithAnchorButOnlyIdAtt</p>
<a id="someAnchor"></a>
</body>
</html>
EOT;
class domGrabber
{
public $_FileErrorStr = '';
pub开发者_JS百科lic $useHTMLMethod = TRUE;
/**
*@desc DOM object factory does the work of loading the DOM object
*/
public function loadDOMObjAndWriteOut($htmlString)
{
$this->_FileErrorStr ='';
$xmlDoc = new DOMDocument();
set_error_handler(array($this, '_FileErrorHandler')); // Warnings and errors are suppressed
if ($this->useHTMLMethod)
{
$xmlDoc->loadHTML($htmlString);
}
else {
$xmlDoc->loadXML($htmlString);
}
restore_error_handler();
echo "<h1>";
echo ($this->useHTMLMethod) ? 'using xmlDoc->loadHTML() ' : 'using $xmlDoc->loadXML()';
echo "</h1>";
echo '<pre>';
print $xmlDoc->saveXML();
echo '</pre>'."\n";
if ($this->_FileErrorStr)
{
echo 'Error'. $this->_FileErrorStr;
}
}
/**
*@desc public so that it can catch errors from outside this class
*/
public function _FileErrorHandler($errno, $errstr, $errfile, $errline)
{
if ($this->_FileErrorStr === null)
{
$this->_FileErrorStr = $errstr;
}
else {
$this->_FileErrorStr .= (PHP_EOL . $errstr);
}
}
}
$domGrabber = new domGrabber();
echo 'PHP Version: '. phpversion() .'<br />'."\n";
$domGrabber->useHTMLMethod = TRUE; //DOM->loadHTML
$domGrabber->loadDOMObjAndWriteOut($stringWithNoAnchor);
$domGrabber->loadDOMObjAndWriteOut($stringWithAnchor );
$domGrabber->loadDOMObjAndWriteOut($stringWithAnchorButOnlyIdAtt);
$domGrabber->useHTMLMethod = FALSE; //use DOM->loadXML
$domGrabber->loadDOMObjAndWriteOut($stringWithNoAnchor);
$domGrabber->loadDOMObjAndWriteOut($stringWithAnchor );
$domGrabber->loadDOMObjAndWriteOut($stringWithAnchorButOnlyIdAtt);
If you are loading XML files (that's the case, XHTML is XML), then you should use DOMDocument::loadXML()
, not DOMDocument::loadHTML()
.
In HTML, both name
and id
introduce an ID. So you are repeating the id "someAnchor", hence the error.
However, the W3C validator allows repeated IDs in the form you show <a id="someAnchor" name="someAnchor"></a>
. This may be a bug of libmxl2.
In this bug report for libxml2, a user proposes a patch to only consider the name
attribute as an ID:
According to the HTML and XHTML specs, only the a element's name attribute shares name space with id attributes. For some of the elements it can be argued that multiple instances with the same name don't make sense, but they should nevertheless not be considered in the same namespace as other elements' id attributes.
See http://www.zvon.org/xxl/xhtmlReference/Output/Strict/attr_name.html for all the elements that take name attributes and their semantics.
精彩评论