replace html tags with bb code

2023-02-22 02:51 问答作者：

How can I replace certain HTML tags with BBcode like tags?

For example replace <a ...> ... </a> with [url ...] ... [/url] or <code ...> ... </code> with [co开发者_运维百科de ...] ... [/code] from a $var string

You could write a customized XSLT to convert the formatting and run it through and XSLT processor to get the desired output.

Reverse HTML to BBCODE conversions are not difficult. Libraries exist for that, and I'm certain we have a duplicate answer. But I'm bad at searching too.

Basically you can use preg_replace like this:

 // for 1:1 translations
 $text = preg_replace('#<(/?)(b|i|code|pre)>#', '[$1$2]', $text);

 // complex tags
 $text = preg_replace('#<a href="([^"]+)">([^<]+)</a>#',
             "[url=$1]$2[/url]", $text);

But the second case will fail if your input HTML doesn't very exactly match the expectations. If you try to convert exported Word files, such a simplistic approach will fail. Also you need more special cases for [img] and stuff.

To convert old articles that were using HTML tags inside, I have created this, pretty complicated, script. The $body variable contains the article text. This procedure is able to replace pre and code tags with a special marker. When all the other tags are converted, the script will replace the previous marker with text. This procedure works with both html or bbcode text.

  // Let's find all code inside the body. The code can be inside <pre></pre>, <code></code>, or [code][/code] if you
  // are using BBCode markup language.
  $pattern = '%(?P<openpre><pre>)(?P<contentpre>[\W\D\w\s]*?)(?P<closepre></pre>)|(?P<opencode><code>)(?P<contentcode>[\W\D\w\s]*?)(?P<closecode></code>)|(?P<openbbcode>\[code=?\w*\])(?P<contentbbcode>[\W\D\w\s]*?)(?P<closebbcode>\[/code\])%i';

  if (preg_match_all($pattern, $body, $snippets)) {

    $pattern = '%<pre>[\W\D\w\s]*?</pre>|<code>[\W\D\w\s]*?</code>|\[code=?\w*\][\W\D\w\s]*?\[/code\]%i';

    // Replaces the code snippet with a special marker to be able to inject the code in place.
    $body = preg_replace($pattern, '___SNIPPET___', $body);
  }


  // Replace links.
  $body = preg_replace_callback('%(?i)<a[^>]+>(.+?)</a>%',

    function ($matches) use ($item) {

      // Extracts the url.
      if (preg_match('/\s*(?i)href\s*=\s*("([^"]*")|\'[^\']*\'|([^\'">\s]+))/', $matches[0], $others) === 1) {
        $href = strtolower(trim($others[1], '"'));

        // Extracts the target.
        if (preg_match('/\s*(?i)target\s*=\s*("([^"]*")|\'[^\']*\'|([^\'">\s]+))/', $matches[0], $others) === 1)
          $target = strtolower(trim($others[1], '"'));
        else
          $target = "_self";
      }
      else
        throw new \RuntimeException(sprintf("Article with idItem = %d have malformed links", $item->idItem));

      return "[url=".$href." t=".$target."]".$matches[1]."[/url]";

    },

    $body
  );


  // Replace images.
  $body = preg_replace_callback('/<img[^>]+>/i',

    function ($matches) use ($item) {

      // Extracts the src.
      if (preg_match('/\s*(?i)src\s*=\s*("([^"]*")|\'[^\']*\'|([^\'">\s]+))/', $matches[0], $others) === 1)
        $src = strtolower(trim($others[1], '"'));
      else
        throw new \RuntimeException(sprintf("Article with idItem = %d have malformed images", $item->idItem));

      return "[img]".$src."[/img]";

    },

    $body
  );


  // Replace other tags.
  $body = preg_replace_callback('%</?[a-z][a-z0-9]*[^<>]*>%i',

    function ($matches) {
      $tag = strtolower($matches[0]);

      switch ($tag) {
        case ($tag == '<strong>' || $tag == '<b>'):
          return '[b]';
          break;

        case ($tag == '</strong>' || $tag == '</b>'):
          return '[/b]';
          break;

        case ($tag == '<em>' || $tag == '<i>'):
          return '[i]';
          break;

        case ($tag == '</em>' || $tag == '</i>'):
          return '[/i]';
          break;

        case '<u>':
          return '[u]';
          break;

        case '</u>':
          return '[/u]';
          break;

        case ($tag == '<strike>' || $tag == '<del>'):
          return '[s]';
          break;

        case ($tag == '</strike>' || $tag == '</del>'):
          return '[/s]';
          break;

        case '<ul>':
          return '[list]';
          break;

        case '</ul>':
          return '[/list]';
          break;

        case '<ol>':
          return '[list=1]';
          break;

        case '</ol>':
          return '[/list]';
          break;

        case '<li>':
          return '[*]';
          break;

        case '</li>':
          return '';
          break;

        case '<center>':
          return '[center]';
          break;

        case '</center>':
          return '[/center]';
          break;

        default:
          return $tag;
      }
    },

    $body
  );


  // Now we strip the remaining HTML tags.
  $body = strip_tags($body);


  // Finally we can restore the snippets, converting the HTML tags to BBCode tags.
  $snippetsCount = count($snippets[0]);

  for ($i = 0; $i < $snippetsCount; $i++) {
    // We try to determine which tags the code is inside: <pre></pre>, <code></code>, [code][/code]
    if (!empty($snippets['openpre'][$i]))
      $snippet = "[code]".PHP_EOL.trim($snippets['contentpre'][$i]).PHP_EOL."[/code]";
    elseif (!empty($snippets['opencode'][$i]))
      $snippet = "[code]".PHP_EOL.trim($snippets['contentcode'][$i]).PHP_EOL."[/code]";
    else
      $snippet = $snippets['openbbcode'][$i].PHP_EOL.trim($snippets['contentbbcode'][$i]).PHP_EOL.$snippets['closebbcode'][$i];

    $body = preg_replace('/___SNIPPET___/', PHP_EOL.trim($snippet).PHP_EOL, $body, 1);
  }

  //echo $body;

Not a trivial task. I looked into this a while back and the best code I came across was this one: cbparser

继续阅读：php regex string

replace html tags with bb code

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？