开发者

Searching a PHP String

I am struggling with PHP a bit.

I created an array and filled a few positions with some curl return data.

I dont see how I would search each array position for <p><strong> and return every character from that to </p>.

From a terminal I might do something like this:

grep -A 2 strong | sed -e 's/<p><strong>//' -e 's/<\/strong><br\/>//' -e 's/<br \/>//开发者_Go百科' -e 's/<\/p>//' -e 's/--//' -e 's/^[ \t]*//;s/[ \t]*$//'

but I am lost doing this in PHP

any advice?

Edit: I want the contents of every <p><strong> to the </p>

Edit 2: Here is the code I am trying:

    $m=array();
preg_match_all('/<p><strong>(.*?)<\/p>/',$buffer,$m);
$sizeM = count($m);

for ( $counter2 = 0; $counter2 <= $sizeM; $counter2++)
{
    $displayString.= $m[$counter2];
}

And getting ArrayArrayArray...as my $displayString

Edit 3: I am doing this:

$curl_handle=curl_init();
curl_setopt($curl_handle,CURLOPT_URL, $url);
curl_setopt($curl_handle, CURLOPT_USERAGENT, "Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.15) Gecko/20110303 Ubuntu/10.04 (lucid) Firefox/3.6.15");
curl_setopt($curl_handle, CURLOPT_HEADER, 0);
curl_setopt($curl_handle,CURLOPT_RETURNTRANSFER,1);

$buffer = curl_exec($curl_handle);

curl_close($curl_handle);

$m=array();
preg_match_all('/<p>.*?<strong>(.*?)<\/p>/i',$buffer,$m);

foreach($m[1] as $mnum=>$match) {
    $displayString.='Match '.$mnum.' is: '.$match."\n";
}


Within PHP and many other languages its preferred not to use string functions or regular expressions to match HTML as HTML is not regular and it can get real buggy.

What you should be looking at is a DOM system that you can iterate through html as an Object, in the same way JavaScript accesses the DOM.

You should look at the following Native PHP Library to get you started: http://php.net/manual/en/class.domdocument.php

You can simply use like so:

$xml = new DOMDocument();

// Load the url's contents into the DOM 
$xml->loadHTMLFile($url); 

//Loop through each <a> tag in the dom and add it to the link array 
foreach($xml->getElementsByTagName('a') as $link)
{
    echo $link->href . "\n";
} 

and this would find all the links in the Document.

Also please see a Post i created and the great answer from Gordon: How do you parse and process HTML/XML in PHP?


preg_match_all()

$m=array();
preg_match_all('/<p>\s*<strong>([\s\S]*?)<\/p>/i',$string,$m);
foreach($m[1] as $mnum=>$match){
    $displayString.='Match '.$mnum.' is: '.$match."\n";
}

$m now contains all matches. $m[0] holds the entire matches, $m[1] holdes the parenthetical matches


As has been pointed out in other posts, if you are trying to process HTML you shouldn't use regular expressions.

To handle finding <p><strong> you could use DOMDocument:

$doc = new DOMDocument();
$doc->loadHTML($html);
$pTags = $doc->getElemetsByTagName('p');
for ($pTags as $pTag) {
  if ($pTag->firstChild->nodeName === 'strong') {
    $data = $pTag->firstChild->nodeValue;
  }
}

Or use XPath:

$doc = new DOMDocument();
$doc->loadHTML($html);
$xpath = new DOMXPath($doc);
$matchingNodes = $xpath->query('//p/strong');

or you may even be able to use expat.

These methods are much clearer, proven, flexible and more failsafe than using regular expressions.

My personal favorite for pulling data out of xml-style docs is xpath. Here is a good set of xpath examples: http://msdn.microsoft.com/en-us/library/ms256086.aspx

Edit: *Note: if you are trying to process very large XML/HTML documents you will not want to use DOMDocument or XPath as they can be slow for large documents. For these cases, go with an event driven XML parser. We have had cases at work where parsing a large XML file with XPath took a few minutes and parsing the same file with an event driven parser took just a few seconds.


Regular expressions will be your friend here. strpos, substr, and explode are useful php functions.


Well, if the positions aren't relevant for the result you're expecting, you could try merging the array into a single string, and perform a regex in there...

Here's the code

    <?php

$data = array(
    'DONT MATCH THISDONT MATCH THIS<p><strong>hello1!</strong></p>DONT MATCH THISDONT MATCH THISDONT MATCH THIS',
    'DONT MATCH THISDONT MATCH THIS<p><strong>hello2!</strong></p>DONT MATCH THISDONT MATCH THISDONT MATCH THIS',
    'DONT MATCH THISDONT MATCH THIS<p><strong>hello3!</strong></p>DONT MATCH THISDONT MATCH THISDONT MATCH THIS',
    '<p><strong>hello4!</strong></p>DONT MATCH THISDONT MATCH THIS<p><strong>hello5!</strong> test test</p>DONT MATCH THISDONT MATCH THISDONT MATCH THIS',
    'DONT MATCH THISDONT MATCH THIS<p><strong>hello6!</strong></p>DONT MATCH THISDONT MATCH THISDONT MATCH THIS',
);

preg_match_all('/<p><strong>.*?<\/p>/',implode($data,''),$results);

print_r($results);


?>

Let me know if this works for you. Cheers!

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜