开发者

HTML Purifier: Converting <body> to <div>

Premise

I'd like to use HTML Purifier to transform <body> tags to <div> tags, to preserve inline styling on the <body> element, e.g. <body style="background:color#000000;">Hi there.</body> would turn to <div style="background:color#000000;">Hi there.</div>. I'm looking at a combination of a custom tag and a TagTransform class.

Current setup

In my configuration section, I'm currently doing this:

$htmlDef  = $this->configuration->getHTMLDefinition(true);
// defining the element to avoid triggering 'Element 'body' is not supported'
$bodyElem = $htmlDef->addElement('body', 'Block', 'Flow', 'Core');
$bodyElem->excludes = array('body' => true);
// add the transformation rule
$htmlDef->info_tag_transform['body'] = new HTMLPurifier_TagTransform_Simple('div');

...as well as allowing <body> and its style (and class, and id) attribute via the configuration directives (they're part of a working, large list that's parsed into HTML.AllowedElements and HTML.AllowedAttributes).

I've turned definition caching off.

$config->set('Cache.DefinitionImpl', null);

Unfortunately, in this setup, it seems like HTMLPurifier_TagTransform_Simple never has its transform() method called.

HTML.Parent?

I presume the culprit is my HTML.Parent, which is set to 'div' since, quite naturally, <div> does not allow a child <body> element. However, setting HTML.Parent to 'html' nets me:

ErrorException: Cannot use unrecog开发者_运维问答nized element as parent

Adding...

$htmlElem = $htmlDef->addElement('html', 'Block', 'Flow', 'Core');
$htmlElem->excludes = array('html' => true);

...gets rid of that error message but still doesn't transform the tag - it's removed instead.

Adding...

$htmlElem = $htmlDef->addElement('html', 'Block', 'Custom: head?, body', 'Core');
$htmlElem->excludes = array('html' => true);

...also does nothing, because it nets me an error message:

ErrorException: Trying to get property of non-object       

[...]/library/HTMLPurifier/Strategy/FixNesting.php:237
[...]/library/HTMLPurifier/Strategy/Composite.php:18
[...]/library/HTMLPurifier.php:181
[...]

I'm still tweaking around with the last option now, trying to figure out the exact syntax I need to provide, but if someone knows how to help me based on their own past experience, I'd appreciate any pointers in the right direction.

HTML.TidyLevel?

As the only other culprit I can imagine it being, my HTML.TidyLevel is set to 'heavy'. I've yet to try all possible constellations on this, but so far, this is making no difference.

(Since I've only been touching this secondarily, I struggle to recall which constellations I've already tried, lest I would list them here, but as it is I lack confidence I wouldn't miss something I've done or misreport something. I might edit this section later when I've done some dedicated testing, though!)

Full Configuration

My configuration data is stored in JSON and then parsed into HTML Purifier. Here's the file:

{
    "CSS" : {
        "MaxImgLength" : "800px"
    },
    "Core" : {
        "CollectErrors" : true,
        "HiddenElements" : {
            "script"   : true,
            "style"    : true,
            "iframe"   : true,
            "noframes" : true
        },
        "RemoveInvalidImg" : false
    },
    "Filter" : {
        "ExtractStyleBlocks" : true
    },
    "HTML" : {
        "MaxImgLength" : 800,
        "TidyLevel"    : "heavy",
        "Doctype"      : "XHTML 1.0 Transitional",
        "Parent"       : "html"
    },
    "Output" : {
        "TidyFormat"   : true
    },
    "Test" : {
        "ForceNoIconv" : true
    },
    "URI" : {
        "AllowedSchemes" : {
            "http"     : true,
            "https"    : true,
            "mailto"   : true,
            "ftp"      : true
        },
        "DisableExternalResources" : true
    }
}

(URI.Base, URI.Munge and Cache.SerializerPath are also set, but I've removed them in this paste. Also, HTML.Parent caveat: As mentioned, usually, this is set to 'div'.)


This code is the reason why what you're doing doesn't work:

/**
 * Takes a string of HTML (fragment or document) and returns the content
 * @todo Consider making protected
 */
public function extractBody($html) {
    $matches = array();
    $result = preg_match('!<body[^>]*>(.*)</body>!is', $html, $matches);
    if ($result) {
        return $matches[1];
    } else {
        return $html;
    }
}

You can turn it off using %Core.ConvertDocumentToFragment as false; if the rest of your code is bugfree, it should work straight from there. I don't believe your bodyElem definition is necessary.j


Wouldn't it be much easier to do:

$search = array('<body', 'body>');
$replace = array('<div', 'div>');

$html = '<body style="background:color#000000;">Hi there.</body>';

echo str_replace($search, $replace, $html);

>> '<div style="background:color#000000;">Hi there.</div>';
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜