开发者

replace html tags with bb code

How can I replace certain HTML tags with BBcode like tags?

For example replace <a ...> ... </a> with [url ...] ... [/url] or <code ...> ... </code> with [co开发者_运维百科de ...] ... [/code] from a $var string


You could write a customized XSLT to convert the formatting and run it through and XSLT processor to get the desired output.


Reverse HTML to BBCODE conversions are not difficult. Libraries exist for that, and I'm certain we have a duplicate answer. But I'm bad at searching too.

Basically you can use preg_replace like this:

 // for 1:1 translations
 $text = preg_replace('#<(/?)(b|i|code|pre)>#', '[$1$2]', $text);

 // complex tags
 $text = preg_replace('#<a href="([^"]+)">([^<]+)</a>#',
             "[url=$1]$2[/url]", $text);

But the second case will fail if your input HTML doesn't very exactly match the expectations. If you try to convert exported Word files, such a simplistic approach will fail. Also you need more special cases for [img] and stuff.


To convert old articles that were using HTML tags inside, I have created this, pretty complicated, script. The $body variable contains the article text. This procedure is able to replace pre and code tags with a special marker. When all the other tags are converted, the script will replace the previous marker with text. This procedure works with both html or bbcode text.

  // Let's find all code inside the body. The code can be inside <pre></pre>, <code></code>, or [code][/code] if you
  // are using BBCode markup language.
  $pattern = '%(?P<openpre><pre>)(?P<contentpre>[\W\D\w\s]*?)(?P<closepre></pre>)|(?P<opencode><code>)(?P<contentcode>[\W\D\w\s]*?)(?P<closecode></code>)|(?P<openbbcode>\[code=?\w*\])(?P<contentbbcode>[\W\D\w\s]*?)(?P<closebbcode>\[/code\])%i';

  if (preg_match_all($pattern, $body, $snippets)) {

    $pattern = '%<pre>[\W\D\w\s]*?</pre>|<code>[\W\D\w\s]*?</code>|\[code=?\w*\][\W\D\w\s]*?\[/code\]%i';

    // Replaces the code snippet with a special marker to be able to inject the code in place.
    $body = preg_replace($pattern, '___SNIPPET___', $body);
  }


  // Replace links.
  $body = preg_replace_callback('%(?i)<a[^>]+>(.+?)</a>%',

    function ($matches) use ($item) {

      // Extracts the url.
      if (preg_match('/\s*(?i)href\s*=\s*("([^"]*")|\'[^\']*\'|([^\'">\s]+))/', $matches[0], $others) === 1) {
        $href = strtolower(trim($others[1], '"'));

        // Extracts the target.
        if (preg_match('/\s*(?i)target\s*=\s*("([^"]*")|\'[^\']*\'|([^\'">\s]+))/', $matches[0], $others) === 1)
          $target = strtolower(trim($others[1], '"'));
        else
          $target = "_self";
      }
      else
        throw new \RuntimeException(sprintf("Article with idItem = %d have malformed links", $item->idItem));

      return "[url=".$href." t=".$target."]".$matches[1]."[/url]";

    },

    $body
  );


  // Replace images.
  $body = preg_replace_callback('/<img[^>]+>/i',

    function ($matches) use ($item) {

      // Extracts the src.
      if (preg_match('/\s*(?i)src\s*=\s*("([^"]*")|\'[^\']*\'|([^\'">\s]+))/', $matches[0], $others) === 1)
        $src = strtolower(trim($others[1], '"'));
      else
        throw new \RuntimeException(sprintf("Article with idItem = %d have malformed images", $item->idItem));

      return "[img]".$src."[/img]";

    },

    $body
  );


  // Replace other tags.
  $body = preg_replace_callback('%</?[a-z][a-z0-9]*[^<>]*>%i',

    function ($matches) {
      $tag = strtolower($matches[0]);

      switch ($tag) {
        case ($tag == '<strong>' || $tag == '<b>'):
          return '[b]';
          break;

        case ($tag == '</strong>' || $tag == '</b>'):
          return '[/b]';
          break;

        case ($tag == '<em>' || $tag == '<i>'):
          return '[i]';
          break;

        case ($tag == '</em>' || $tag == '</i>'):
          return '[/i]';
          break;

        case '<u>':
          return '[u]';
          break;

        case '</u>':
          return '[/u]';
          break;

        case ($tag == '<strike>' || $tag == '<del>'):
          return '[s]';
          break;

        case ($tag == '</strike>' || $tag == '</del>'):
          return '[/s]';
          break;

        case '<ul>':
          return '[list]';
          break;

        case '</ul>':
          return '[/list]';
          break;

        case '<ol>':
          return '[list=1]';
          break;

        case '</ol>':
          return '[/list]';
          break;

        case '<li>':
          return '[*]';
          break;

        case '</li>':
          return '';
          break;

        case '<center>':
          return '[center]';
          break;

        case '</center>':
          return '[/center]';
          break;

        default:
          return $tag;
      }
    },

    $body
  );


  // Now we strip the remaining HTML tags.
  $body = strip_tags($body);


  // Finally we can restore the snippets, converting the HTML tags to BBCode tags.
  $snippetsCount = count($snippets[0]);

  for ($i = 0; $i < $snippetsCount; $i++) {
    // We try to determine which tags the code is inside: <pre></pre>, <code></code>, [code][/code]
    if (!empty($snippets['openpre'][$i]))
      $snippet = "[code]".PHP_EOL.trim($snippets['contentpre'][$i]).PHP_EOL."[/code]";
    elseif (!empty($snippets['opencode'][$i]))
      $snippet = "[code]".PHP_EOL.trim($snippets['contentcode'][$i]).PHP_EOL."[/code]";
    else
      $snippet = $snippets['openbbcode'][$i].PHP_EOL.trim($snippets['contentbbcode'][$i]).PHP_EOL.$snippets['closebbcode'][$i];

    $body = preg_replace('/___SNIPPET___/', PHP_EOL.trim($snippet).PHP_EOL, $body, 1);
  }

  //echo $body;


Not a trivial task. I looked into this a while back and the best code I came across was this one: cbparser

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜