开发者

preg_replace hell

I'm trying to use preg_replace to get some data from a remote page, but I'm having a bit of an issue when it comes to sorting out the pattern.

function getData($Url){
    $str = file_get_contents($Url);
    if(strlen($str)>0){
        preg_match("/\<span class=\"SectionHeader\"\>title\</span>/<br/>/\<div class=\"header2\"\>(.*)\</div\></span\>/",$str,$title);
        return $title[1];
    }
}

Here's the HTML as is before I ended up throwing a million slashes at it (looks like I forgot a p开发者_JAVA百科art or two):

<span class="cell CellFullWidth"><span class="SectionHeader">mytitle</span><br/><div class="Center">Event Name</div></span>

Where Event Name is the data I want to return in my function.

Thanks a lot guys, this is a pain in the ass.


While I am inclined to agree with the commenters that this is not a pretty solution, here's my untested revision of your statement:

    preg_match('#\<span class="SectionHeader"\>title\</span\>/\<br/\>/\<div class="header2"\>(.*)\</div\>\</span\>#',$str,$title);

I changed the double-quoted string to single-quoted as you aren't using any of the variable-substitution features of double-quoted strings and this avoids having to backslash-escape double-quotes as well as avoiding any ambiguity about backslashes (which perhaps should have been doubled to produce the proper strings--see the php manual on strings). I changed the slash / delimiters to hash # because of the number of slashes appearing in the match pattern (some of which were not backslash-escaped in your version).


There are quite a few things wrong with your expression:

  • You're using / as the delimiter, but then use / unescaped in various places.
  • You're escaping < and > seemingly at random. They shouldn't be escaped at all.
  • You have some rogue /s around the <br/> for some reason.
  • The class name for the div is specified as header2 in the regex but Center in the sample HTML
  • The title is mytitle in the HTML and title in the regex

With all of these corrected, you get:

preg_match('(<span class="SectionHeader">mytitle</span><br/><div class="Center">(.*)</div\></span\>)',$data,$t);

If you want to match any title instead of the specific title mytitle, just replace that with .*?.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜