Regex help (extracting data between hr tags)
I'm trying to extract teaser text within TinyMCE in a PHP CMS. The <hr /> tag is not used within my design, so I'd like to extract the text in the following scenarios, based on the ass开发者_如何学Pythonumption that the content administrator will ONLY use them to define Teaser text:
- Extract content before an - <hr />tag (in the situation the content administrator enters the teaser text at the beginning of the RTE, and then uses- <hr />as the cuttof point)
- Extract the content between 2 - <hr />tags (in the situation where the content administrator enters the teaser text anywhere within the content, and indicates it with- <hr />tags on either side.
What regex should I use to cover the above?
I'm not sure I get your question correctly but here's a try:
if (preg_match('~^(.*?)<hr />((.+?)<hr />)?~is', $test, $matches)) {
  // at least one <hr /> present
  if (empty($matches[2])) {
    // no second <hr />
    $teaser = $matches[1];
  } else {
    // there is a second <hr />
    $teaser = $matches[3];
  }
} else {
  // no teaser
  $teaser = "";
}
<?php
 
$strs = array(
   'GET ME A <hr /> bla',
   'Bla bla<hr /> GET ME B <hr />'
);
 
foreach($strs as $str) {
 
    $a = preg_match_all('/(<hr \/>)?(?P<teaser>.*?)<hr \/>/', $str, $matches);
 
    var_dump($a, $matches);
 
}
Ideone.
Output
int(1)
array(4) {
  [0]=>
  array(1) {
    [0]=>
    string(15) "GET ME A <hr />"
  }
  [1]=>
  array(1) {
    [0]=>
    string(0) ""
  }
  ["teaser"]=>
  array(1) {
    [0]=>
    string(9) "GET ME A "
  }
  [2]=>
  array(1) {
    [0]=>
    string(9) "GET ME A "
  }
}
int(2)
array(4) {
  [0]=>
  array(2) {
    [0]=>
    string(13) "Bla bla<hr />"
    [1]=>
    string(16) " GET ME B <hr />"
  }
  [1]=>
  array(2) {
    [0]=>
    string(0) ""
    [1]=>
    string(0) ""
  }
  ["teaser"]=>
  array(2) {
    [0]=>
    string(7) "Bla bla"
    [1]=>
    string(10) " GET ME B "
  }
  [2]=>
  array(2) {
    [0]=>
    string(7) "Bla bla"
    [1]=>
    string(10) " GET ME B "
  }
}
This tested function does the trick:
function get_teaser($text) {
    // First count how many <hr/> tags there are.
    $count = preg_match_all('%<hr\s*/?>%i', $text, $matches);
    if (!$count) return ''; // None? return empty string.
    switch($count) {
    case (1): // Case I: From start up to only HR tag.
        preg_match('%^(.*?)<hr\s*/?>%si', $text, $matches);
        return $matches[1];
        break;
    case (2): // Case II: Stuff between two HR tags.
        preg_match('%<hr\s*/?>(.*?)<hr\s*/?>%si', $text, $matches);
        return $matches[1];
        break;
    default: // Case III: Three or more HR tags is an error.
        return 'Error! Too many <hr /> tags.';
    }
}
This also allows for various HR tag forms: e.g. <hr>, <hr/>, <hr   />.
 
         加载中,请稍侯......
 加载中,请稍侯......
      
精彩评论