HTML Parser to Get Content between Elements
I'm looking to parse data out of about 100 pages, all of which have this string of HTML in it:
<span class="cell 开发者_运维技巧CellFullWidth"><span class="SectionHeader">EVENT</span><br/><div class="Center">Event Name</div></span>
I'm not very familiar with parsers, so I'm wondering what I should use to extract the "Event Name" from each page that I loop through, and how I should go about doing that.
I looked into Simple HTML DOM but I couldn't quite figure it out. Please help, thanks!
Assuming:
- All event names are in divs
- The containing div must have the class "Center"
- All divs with the class "Center" contains the name of an event
Here goes:
<?php
$content = '
<span class="cell CellFullWidth"><span class="SectionHeader">EVENT</span><br/><div class="Center">Event Name1</div></span>
<span class="cell CellFullWidth"><span class="SectionHeader">EVENT</span><br/><div class="Center">Event Name2</div></span>
';
$html = new DOMDocument();
$html->loadHTML($content);
$divs = $html->getElementsByTagName('div');
foreach($divs as $div) {
if($div->getAttribute('class') == 'Center') {
$events[] = $div->nodeValue;
}
}
print_r($events);
If all the text except the event name is always the same, you can do it with just a substring (since the start and end bits will always be the same length)
$event_name = substr($current_line, 98, -14);
That'll give you what's left over when you remove the first 98 characters and the last 14.
You could use PHP's DOM manipulation functions.
Basically, you'd create a new DOMDocument via DOMDocument::loadHTML() or DOMDocument::loadHTMLFile(), and then use $yourDOmObject->getElementsByTagName() to get all the <span>
elements.
精彩评论