开发者

Regex problem - when scraping HTML segment

I'm trying to use Regex to scrape contents between the anchors

"<h2>Highlights</h2>" & "</div><div class="FloatClear"></div><div id="SalesMarquee">" within the HTML segment below:

But when I tried this regex, it returns nothing...

<h2>Highlights<\/h2>\t?\n?\s?\S?(.*?)<\/div>

I think it may have something to do with the empty spaces within the HTML source...

Can any Regex gurus give me the magic expression for grabbing everything between any given HTML archors, like the ones mentioned above (that can also cope with any empty spaces within the HTML source)?

BTW I can't use any PHP code as the Regex is for a script I purchased (there is just a text box I have to enter the regex into)...

Many thanks

HTML segment:

<div id="Highlights">

      <h2>Highlights</h2>

      <ul>

<li>1234</li>

<li>abc def asdasd asdasd</li>

<li>asdasda as asdasdasdas </li>

<li>asdasd asdasdas asdsad asdasd asa</li>

</ul>


     </div>

     <div开发者_Go百科 class="FloatClear"></div>

     <div id="SalesMarquee">

      <div id="SalesMarqueeTemplate" style="display: none;">


Use any HTML dom parser like SIMPLE HTML DOM PARSER

// Create DOM from URL or file
$html = file_get_html('http://www.example.com/');

// Find all links
foreach($html->find('a') as $element)
       echo $element->href . '<br>'; 


Agree with Naveed - here is a post that is similar - Robust and Mature HTML Parser for PHP


The following pcre regex should work.

/<h2>.*<\/h2>(.*)<\/div>/is

The last two characters is i for ignore case and s for dot all mode. Dot all mode makes the dot match newlines as well.

Edit: You'll probably want this regex instead:

/<h2>Highlights<\/h2>(.*)<\/div>.*<div class="FloatClear">/is


Try adding an 'm' modifier (for 'multiline' to the regexes provided by hlindset:

/<h2>Highlights<\/h2>(.*)<\/div>.*<div class="FloatClear">/ism

Here it is in action:

  • http://www.rubular.com/r/td1IUBvg26

Documentation on all modifiers is available by googling "pcre pattern modifiers".

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜