Regex help (extracting data between hr tags)
I'm trying to extract teaser text within TinyMCE in a PHP CMS. The <hr />
tag is not used within my design, so I'd like to extract the text in the following scenarios, based on the ass开发者_如何学Pythonumption that the content administrator will ONLY use them to define Teaser text:
Extract content before an
<hr />
tag (in the situation the content administrator enters the teaser text at the beginning of the RTE, and then uses<hr />
as the cuttof point)Extract the content between 2
<hr />
tags (in the situation where the content administrator enters the teaser text anywhere within the content, and indicates it with<hr />
tags on either side.
What regex should I use to cover the above?
I'm not sure I get your question correctly but here's a try:
if (preg_match('~^(.*?)<hr />((.+?)<hr />)?~is', $test, $matches)) {
// at least one <hr /> present
if (empty($matches[2])) {
// no second <hr />
$teaser = $matches[1];
} else {
// there is a second <hr />
$teaser = $matches[3];
}
} else {
// no teaser
$teaser = "";
}
<?php
$strs = array(
'GET ME A <hr /> bla',
'Bla bla<hr /> GET ME B <hr />'
);
foreach($strs as $str) {
$a = preg_match_all('/(<hr \/>)?(?P<teaser>.*?)<hr \/>/', $str, $matches);
var_dump($a, $matches);
}
Ideone.
Output
int(1)
array(4) {
[0]=>
array(1) {
[0]=>
string(15) "GET ME A <hr />"
}
[1]=>
array(1) {
[0]=>
string(0) ""
}
["teaser"]=>
array(1) {
[0]=>
string(9) "GET ME A "
}
[2]=>
array(1) {
[0]=>
string(9) "GET ME A "
}
}
int(2)
array(4) {
[0]=>
array(2) {
[0]=>
string(13) "Bla bla<hr />"
[1]=>
string(16) " GET ME B <hr />"
}
[1]=>
array(2) {
[0]=>
string(0) ""
[1]=>
string(0) ""
}
["teaser"]=>
array(2) {
[0]=>
string(7) "Bla bla"
[1]=>
string(10) " GET ME B "
}
[2]=>
array(2) {
[0]=>
string(7) "Bla bla"
[1]=>
string(10) " GET ME B "
}
}
This tested function does the trick:
function get_teaser($text) {
// First count how many <hr/> tags there are.
$count = preg_match_all('%<hr\s*/?>%i', $text, $matches);
if (!$count) return ''; // None? return empty string.
switch($count) {
case (1): // Case I: From start up to only HR tag.
preg_match('%^(.*?)<hr\s*/?>%si', $text, $matches);
return $matches[1];
break;
case (2): // Case II: Stuff between two HR tags.
preg_match('%<hr\s*/?>(.*?)<hr\s*/?>%si', $text, $matches);
return $matches[1];
break;
default: // Case III: Three or more HR tags is an error.
return 'Error! Too many <hr /> tags.';
}
}
This also allows for various HR tag forms: e.g. <hr>
, <hr/>
, <hr />
.
精彩评论