Parsing HTML to get all Option tags with PHP
I'm 开发者_开发问答parsing and HTML page that contains a:
<select>
<option value="somevalue">Somedata</option>
</select>
And I need to get both somevalue and somedata out of there.
What's the easiest way to go about this? It should be noted that somevalue and Somedata is always different (So to speak)
It is formed like:
<select name="attrib1" class="Input">
<option value="0"> </option>
<option value="140">140</option>
<option value="141">150</option>
<option value="142">160</option>
</select>
Please note, the name is ALWAYS attrib1!
Okay, since I can't see the full HTML, I'm not really sure if it's well formed, so I'll attempt to do this using more forgiving DOM functions. First off, I'm going to use this minimal html file as a sample:
test.html
<html>
<body>
<select name="attrib1" class="Input">
<option value="0"> </option>
<option value="140">140</option>
<option value="141">150</option>
<option value="142">160</option>
</select>
</body>
</html>
Now then, the first thing we need to do is create a DOM parser. We'll do this like so:
$doc = new DOMDocument();
$doc->loadHTMLFile("test.html");
Okay, next we'll need to look at the requirements:
I'm parsing and HTML page that contains a:
<select> <option value="somevalue">Somedata</option> </select>
And I need to get both somevalue and somedata out of there.
You also mention:
Please note, the name is ALWAYS attrib1!
Based on these requirements, I'm going to select all option tags that are a child of selects with the name "attrib1". To do so, I'm going to use something called XPath. This is a very flexible way to select dom elements based on specific conditions. Let's slowly build this out:
*/
select all elements
*/select
select all elements that are select elements
*/select[@name='attrib1']
select all elements that are select elements with the name of attrib1
*/select[@name='attrib1']/option select all
select all option elements under all select elements with the name of attrib1
Now then, we need to do this lookup, so we use the XPath functions:
$xpath = new DOMXpath($doc);
$options = $xpath->query("*/select[@name='attrib1']/option");
foreach ($options as $option) {
}
Now we need the value attribute, and the text inside. We'll first get the value attribute:
$optionValue = $option->getAttribute('value');
Then we get what's inside the option tag:
$optionContent = $option->nodeValue;
And once we put this all together:
$doc = new DOMDocument();
$doc->loadHTMLFile("test.html");
$xpath = new DOMXpath($doc);
$options = $xpath->query("*/select[@name='attrib1']/option");
foreach ($options as $option) {
$optionValue = $option->getAttribute('value');
$optionContent = $option->nodeValue;
echo "$optionValue and $optionContent\n";
}
We'll get the following output:
0 and
140 and 140
141 and 150
142 and 160
And there you have it.
Use http://php.net/manual/en/book.dom.php
Please do not try to use regex
HTML is not a regular language. Trying to parse it as such, will at first glance seem to work, however it will definitively bite you in the ass later.
Answering your question:
The easiest way is to use regular expressions with preg_match_all()
function.
You have to create some regular expression matching all option tags and extracting both values you need.
精彩评论