开发者

Regex help (extracting data between hr tags)

I'm trying to extract teaser text within TinyMCE in a PHP CMS. The <hr /> tag is not used within my design, so I'd like to extract the text in the following scenarios, based on the ass开发者_如何学Pythonumption that the content administrator will ONLY use them to define Teaser text:

  1. Extract content before an <hr /> tag (in the situation the content administrator enters the teaser text at the beginning of the RTE, and then uses <hr /> as the cuttof point)

  2. Extract the content between 2 <hr /> tags (in the situation where the content administrator enters the teaser text anywhere within the content, and indicates it with <hr /> tags on either side.

What regex should I use to cover the above?


I'm not sure I get your question correctly but here's a try:

if (preg_match('~^(.*?)<hr />((.+?)<hr />)?~is', $test, $matches)) {
  // at least one <hr /> present

  if (empty($matches[2])) {
    // no second <hr />
    $teaser = $matches[1];

  } else {
    // there is a second <hr />
    $teaser = $matches[3];
  }
} else {
  // no teaser
  $teaser = "";
}


<?php
 
$strs = array(
   'GET ME A <hr /> bla',
   'Bla bla<hr /> GET ME B <hr />'
);
 
foreach($strs as $str) {
 
    $a = preg_match_all('/(<hr \/>)?(?P<teaser>.*?)<hr \/>/', $str, $matches);
 
    var_dump($a, $matches);
 
}

Ideone.

Output

int(1)
array(4) {
  [0]=>
  array(1) {
    [0]=>
    string(15) "GET ME A <hr />"
  }
  [1]=>
  array(1) {
    [0]=>
    string(0) ""
  }
  ["teaser"]=>
  array(1) {
    [0]=>
    string(9) "GET ME A "
  }
  [2]=>
  array(1) {
    [0]=>
    string(9) "GET ME A "
  }
}
int(2)
array(4) {
  [0]=>
  array(2) {
    [0]=>
    string(13) "Bla bla<hr />"
    [1]=>
    string(16) " GET ME B <hr />"
  }
  [1]=>
  array(2) {
    [0]=>
    string(0) ""
    [1]=>
    string(0) ""
  }
  ["teaser"]=>
  array(2) {
    [0]=>
    string(7) "Bla bla"
    [1]=>
    string(10) " GET ME B "
  }
  [2]=>
  array(2) {
    [0]=>
    string(7) "Bla bla"
    [1]=>
    string(10) " GET ME B "
  }
}


This tested function does the trick:

function get_teaser($text) {
    // First count how many <hr/> tags there are.
    $count = preg_match_all('%<hr\s*/?>%i', $text, $matches);
    if (!$count) return ''; // None? return empty string.
    switch($count) {
    case (1): // Case I: From start up to only HR tag.
        preg_match('%^(.*?)<hr\s*/?>%si', $text, $matches);
        return $matches[1];
        break;
    case (2): // Case II: Stuff between two HR tags.
        preg_match('%<hr\s*/?>(.*?)<hr\s*/?>%si', $text, $matches);
        return $matches[1];
        break;
    default: // Case III: Three or more HR tags is an error.
        return 'Error! Too many <hr /> tags.';
    }
}

This also allows for various HR tag forms: e.g. <hr>, <hr/>, <hr />.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜