开发者

Regular Expression: Converting non-block elements with <br /> to <p> in PHP

Someone has asked a similar question, but the accepted answer doesn't meet my requirements.

Input:

<strong>bold <br /><br /> text</strong><br /><br /><br />
<a href="#">link</a><br /><br />
<pre>some code&开发者_JAVA技巧lt;/pre>
I'm a single br, <br /> leave me alone.

Expected output:

<p><strong>bold <br /> text</strong><br /></p>
<p><a href="#">link</a><br /></p>
<pre>some code</pre>
<p>I'm a single br, <br /> leave me alone.</p>

The accepted answer I mentioned above will convert multiple br to p, and at last wrap all the input with another p. But in my case, you can't wrap pre inside a p tag. Can anyone help?

update

the expected output before this edit was a little bit confusing. the whole point is:

  1. convert multiple br to a single one (achieved with preg_replace('/(<br />)+/', '<br />', $str);)

  2. check for inline elements and unwrapped text (there's no parent element in this case, input is from $_POST) and wrap with <p>, leave block level elements alone.


Do not use regex. Why? See: RegEx match open tags except XHTML self-contained tags

Use proper DOM manipulators. See: http://php.net/manual/en/book.dom.php

EDIT: I'm not really a fan of giving cookbook-recipes, so here's a solution for changing double <br />'s to text wrapped in <p></p>:

script.php:
<?php

function isBlockElement($nodeName) {
  $blockElementsArray = array("pre", "div"); // edit to suit your needs
  return in_array($nodeName, $blockElementsArray);
}

function hasBlockParent(&$node) {
  if (!($node instanceof DOMNode)) {
    // return whatever you wish to return on error
    // or throw an exception
  }
  if (is_null($node->parentNode))
    return false;

  if (isBlockElement($node->parentNode))
    return true;

  return hasBlockParent($node->parentNode);
}

$myDom = new DOMDocument;
$myDom->loadHTMLFile("in-file");
$myDom->normalizeDocument();


$elems =& $myDom->getElementsByTagName("*");
for ($i = 0; $i < $elems->length; $i++) {
  $element =& $elems->item($i);
  if (($element->nextSibling->nodeName == "br" && $element->nextSibling->nextSibling->nodeName == "br") && !hasBlockParent($element)) {
    $parent =& $element->parentNode;
    $parent->removeChild($element->nextSibling->nextSibling);
    $parent->removeChild($element->nextSibling);

    // check if there are further nodes on the same level
    $nSibling;
    if (!is_null($element->nextSibling))
      $nSibling = $element->nextSibling;
    else
      $nSibling = NULL;

    // delete the old node
    $saved = $parent->removeChild($element);
    $newNode = $myDom->createElement("p");
    $newNode->appendChild($saved);
    if ($nSibling == NULL)
      $parent->appendChild($newNode);
    else 
      $parent->insertBefore($newNode, $nSibling);
  }
}

$myDom->saveHTMLFile("out-file");

?>

This is not really a full solution, but it's a starting point. This is the best I could write during my lunch break, and please bear in mind that the last time I coded in PHP was about 2 years ago (been doing mostly C++ since then). I was not writing it as a full solution but rather to give you a...well, starting point :)

So anyways, the input file:

[dare2be@schroedinger dom-php]$ cat in-file
<strong>bold <br /><br /> text</strong><br /><br /><br />
<a href="#">link</a><br /><br />
<pre>some code</pre>
I'm a single br, <br /> leave me alone.

And the output file:

[dare2be@schroedinger dom-php]$ cat out-file 
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body><p><strong>bold <br><br> text</strong></p><br><p><a href="#">link</a></p><pre>some code</pre>
I'm a single br, <br> leave me alone.</body></html>

The whole DOCTYPE mumbo jumbo is a side-effect. The code doesn't do the rest of the things you said, like changing <bold><br><br></bold> to <bold><br></bold>. Also, this whole script is a quick draft, but you'll get the idea.


Alright, I'v got myself an answer, and I believe this is gonna work really well.

It's from WordPress...the wpautop function.

I'v tested it with the input (from my question), and the output is -almost- the same as I expected, I just need to modify it a bit to fit my needs.

Thanks dare2be, but I'm not very familiar with DOM manipulator in PHP.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜