开发者

preg_match_all: Why would "this" match but "that" won't?

So, I'm basically trying to match anything inside (and including) object tags, with this:

<?php preg_match_all('/<object(.*)<\/object>/', $blah, $blahBlah); ?>

It finds a match for this:

<object classid="clsid:d27cdb6e-ae6开发者_JS百科d-11cf-96b8-444553540000" width="400" height="250" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="allowfullscreen" value="true" /><param name="allowscriptaccess" value="always" /><param name="src" value="http://vimeo.com/moogaloop.swf?clip_id=9048799&amp;server=vimeo.com&amp;show_title=1&amp;show_byline=1&amp;show_portrait=0&amp;color=&amp;fullscreen=1" /><embed type="application/x-shockwave-flash" width="400" height="250" src="http://vimeo.com/moogaloop.swf?clip_id=9048799&amp;server=vimeo.com&amp;show_title=1&amp;show_byline=1&amp;show_portrait=0&amp;color=&amp;fullscreen=1" allowscriptaccess="always" allowfullscreen="true"></embed></object>

But it won't match this:

<object width="400" height="300"><param name="allowfullscreen" value="true" /><param name="allowscriptaccess" value="always" /><param name="movie" value="http://vimeo.com/moogaloop.swf?clip_id=5630744&amp;server=vimeo.com&amp;show_title=1&amp;show_byline=1&amp;show_portrait=0&amp;color=00ADEF&amp;fullscreen=1" /><embed src="http://vimeo.com/moogaloop.swf?clip_id=5630744&amp;server=vimeo.com&amp;show_title=1&amp;show_byline=1&amp;show_portrait=0&amp;color=00ADEF&amp;fullscreen=1" type="application/x-shockwave-flash" allowfullscreen="true" allowscriptaccess="always" width="400" height="300"></embed></object>

Any idea why? Thanks for any insight.


ETA: Since my approach may have been faulty to begin with, here's some background on what I'm trying to do.

This is for a Wordpress site. I am using a plugin that converts a shorttag into a full video embed code. The plugin was recently (thankfully) updated to make the code more valid.

The function I am trying to create is simply to find the first video object in a post, and grab it for use elsewhere on the site.

Here is the entire function (some of it will only make sense if you've worked with Wordpress):

<?php
function catch_that_video() {
  global $post, $posts;
  $the_video = '';
  ob_start();
  ob_end_clean();
  $output = preg_match_all('/<object(.*)<\/object>/', $post->post_content, $vid_matches);
  $the_video = $vid_matches [1] [0];
  if(empty($the_video)){ $the_video = 0; }
  return $the_video;
}
?>


The only thing that comes to mind is single vs multiple lines.

/<object(.*)<\/object>/m

That should match across multiple lines.

This manual page discusses the modifiers:

http://www.php.net/manual/en/reference.pcre.pattern.modifiers.php

Update:

Upon further investigation, m is not the correct modifier (from the manual):

m (PCRE_MULTILINE) By default, PCRE treats the subject string as consisting of a single "line" of characters (even if it actually contains several newlines). The "start of line" metacharacter (^) matches only at the start of the string, while the "end of line" metacharacter ($) matches only at the end of the string, or before a terminating newline (unless D modifier is set). This is the same as Perl. When this modifier is set, the "start of line" and "end of line" constructs match immediately following or immediately before any newline in the subject string, respectively, as well as at the very start and end. This is equivalent to Perl's /m modifier. If there are no "\n" characters in a subject string, or no occurrences of ^ or $ in a pattern, setting this modifier has no effect.

(Emphasis my own.)

The correct modifier would be s which would allow the dot metacharacter . to match newlines.

Moving on to the updated question, the regex itself matches both of those inputs, if those inputs are simple strings. I don't know what's causing the actual issue.

$input = '<object classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="400" height="250" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="allowfullscreen" value="true" /><param name="allowscriptaccess" value="always" /><param name="src" value="http://vimeo.com/moogaloop.swf?clip_id=9048799&amp;server=vimeo.com&amp;show_title=1&amp;show_byline=1&amp;show_portrait=0&amp;color=&amp;fullscreen=1" /><embed type="application/x-shockwave-flash" width="400" height="250" src="http://vimeo.com/moogaloop.swf?clip_id=9048799&amp;server=vimeo.com&amp;show_title=1&amp;show_byline=1&amp;show_portrait=0&amp;color=&amp;fullscreen=1" allowscriptaccess="always" allowfullscreen="true"></embed></object>';

$input2 = '<object width="400" height="300"><param name="allowfullscreen" value="true" /><param name="allowscriptaccess" value="always" /><param name="movie" value="http://vimeo.com/moogaloop.swf?clip_id=5630744&amp;server=vimeo.com&amp;show_title=1&amp;show_byline=1&amp;show_portrait=0&amp;color=00ADEF&amp;fullscreen=1" /><embed src="http://vimeo.com/moogaloop.swf?clip_id=5630744&amp;server=vimeo.com&amp;show_title=1&amp;show_byline=1&amp;show_portrait=0&amp;color=00ADEF&amp;fullscreen=1" type="application/x-shockwave-flash" allowfullscreen="true" allowscriptaccess="always" width="400" height="300"></embed></object>';

$matches = array();
preg_match_all('/<object(.*)<\/object>/', $input, $matches); 
echo '<br />$input<pre>';
var_dump($matches);
echo '</pre>';

$matches2 = array();
preg_match_all('/<object(.*)<\/object>/', $input2, $matches2); 
echo '<br />$input2<pre>';
var_dump($matches2);
echo '</pre>';

Moving on:

What are you trying to accomplish with these two lines?

ob_start();
ob_end_clean();

This opens a new output buffer and immediately kills it. (See the bit about stacking output buffers in the documentation.)

Is there a reason to set this equal to 0, instead of say null?

if(empty($the_video)){ $the_video = 0; }

Personally, I would set it to null when declaring it and rely on not clobbering that if there are no matches. This is how I would write that function, assuming that $post is a WordPress global. (Personally, I would just pass that into the function, as I have a disdain for most globals.)

function catch_that_video() 
{
  global $post;

  $the_video = null;
  $vid_matches = array();

  if(preg_match('/<object.*<\/object>/', $post->post_content, $vid_matches))
  {
    $the_video = $vid_matches[0];
  }

  return $the_video;
}

I changed it to use preg_match instead of preg_match_all, since you're using only the first match. This can, of course, be modified to use preg_match_all, if necessary. Though, the appropriate regex will be a pain to create. (Adding the s modifier to the above regex in order to deal with multiple lines would grab everything from the first opening <object> tag to the last closing </object> tag. I don't even want to think about trying to come up with a regex to cover multiple lines and grab individual <object>...</object> blocks.)

However, this doesn't answer the original question as to why the 2nd object block isn't being matched. I would focus my investigation on trying to discover the difference between the two strings. If the issue was the difference between line endings, I would use something like VIM on Linux, as that would display `^M' in place of the \r in the line endings. What about html encoding of the string? Might that be a possible issue?


This ..

 ob_start();
 ob_end_clean();

.. should look something like this ..


  ob_get_level() 
    and ob_end_clean();

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜