How do I extract values from a html page stored as string using curl function
I am using PHP / curl to get a HTML into a string and then i need to extract the following data and then project a graph out of it .
The data I want looks like :
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta name="generator" content=
"HTML Tidy for Linux (vers 25 March 2009), see www.w3.org" />
<title></title>
</head>
<body>
<table>
<tbody>
<tr>
<td>
<h3>Income</h3>
</td>
</tr>
<tr>
<td>Operating income</td>
<td class="numericalColumn">22,922.00</td>
<td class="numericalColumn">21,507.30</td>
<td class="numericalColumn">17,492.60</td>
<td class="numericalColumn">13,683.90</td>
<td class="numericalColumn">10,227.12</td>
</tr>
<tr>
<td>
<h3>Expenses</h3>
</td>
</tr>
<tr>
<td>Material consumed</td>
<td class="numericalColumn">4,029.40</td>
<td class="numericalColumn">3,442.60</td>
<td class="numericalColumn">2,952.30</td>
<td class="numericalColumn">1,889.00</td>
<td class="numericalColumn">1,367.67</td>
</tr>
<tr>
<td>Manufacturing expenses </td>
<td class="numericalColumn">2,213.20</td>
<td class="numericalColumn">1,841.80</td>
<td class="numericalColumn">299.80</td>
<td class="numericalColumn">120.50</td>
<td class="numericalColumn">1,020.70</td>
</tr>
<tr>
<td>Personnel expenses</td>
<td class="numericalColumn">9,062.80</td>
<td class="numericalColumn">9,249.80</td>
<td class="numericalColumn">7,409.10</td>
<td class="numericalColumn">5,768.20</td>
<td class="numericalColumn">4,279.03</td>
</tr>
<tr>
<td>Selling expenses</td>
<td class="numericalColumn">378.10</td>
<td class="numericalColumn">308.40</td>
<td class="numericalColumn">532.10</td>
<td class="numericalColumn">-</td>
<td class="numericalColumn">171.05</td>
</tr>
<tr>
<td>Adminstrative expenses</td>
<td class="numericalColumn">1,737.00</td>
<td class="numericalColumn">1,906.00</td>
<td class="numericalColumn">2,583.70</td>
<td class="numericalColumn">2,651.70</td>
开发者_运维知识库 <td class="numericalColumn">904.78</td>
</tr>
<tr>
<td>Expenses capitalised</td>
<td class="numericalColumn">-</td>
<td class="numericalColumn">-</td>
<td class="numericalColumn">-</td>
<td class="numericalColumn">-</td>
<td class="numericalColumn">-</td>
</tr>
<tr>
<td>Cost of sales</td>
<td class="numericalColumn">17,420.50</td>
<td class="numericalColumn">16,748.60</td>
<td class="numericalColumn">13,777.00</td>
<td class="numericalColumn">10,429.40</td>
<td class="numericalColumn">7,743.22</td>
</tr>
<tr>
<td>Operating profit</td>
<td class="numericalColumn">5,501.50</td>
<td class="numericalColumn">4,758.70</td>
<td class="numericalColumn">3,715.60</td>
<td class="numericalColumn">3,254.50</td>
<td class="numericalColumn">2,483.90</td>
</tr>
<tr>
<td>Other recurring income</td>
<td class="numericalColumn">434.20</td>
<td class="numericalColumn">468.20</td>
<td class="numericalColumn">326.90</td>
<td class="numericalColumn">288.70</td>
<td class="numericalColumn">113.59</td>
</tr>
<tr>
<td>Adjusted PBDIT</td>
<td class="numericalColumn">5,935.70</td>
<td class="numericalColumn">5,226.90</td>
<td class="numericalColumn">4,042.50</td>
<td class="numericalColumn">3,543.20</td>
<td class="numericalColumn">2,597.49</td>
</tr>
<tr>
<td>Financial expenses</td>
<td class="numericalColumn">108.40</td>
<td class="numericalColumn">196.80</td>
<td class="numericalColumn">116.80</td>
<td class="numericalColumn">7.20</td>
<td class="numericalColumn">3.13</td>
</tr>
<tr>
<td>Depreciation </td>
<td class="numericalColumn">579.60</td>
<td class="numericalColumn">533.60</td>
<td class="numericalColumn">456.00</td>
<td class="numericalColumn">359.80</td>
<td class="numericalColumn">292.26</td>
</tr>
<tr>
<td>Other write offs</td>
<td class="numericalColumn">-</td>
<td class="numericalColumn">-</td>
<td class="numericalColumn">-</td>
<td class="numericalColumn">-</td>
<td class="numericalColumn">-</td>
</tr>
<tr>
<td>Adjusted PBT</td>
<td class="numericalColumn">5,247.70</td>
<td class="numericalColumn">4,496.50</td>
<td class="numericalColumn">3,469.70</td>
<td class="numericalColumn">3,176.20</td>
<td class="numericalColumn">2,302.10</td>
</tr>
<tr>
<td>Tax charges </td>
<td class="numericalColumn">790.80</td>
<td class="numericalColumn">574.10</td>
<td class="numericalColumn">406.40</td>
<td class="numericalColumn">334.10</td>
<td class="numericalColumn">286.10</td>
</tr>
<tr>
<td>Adjusted PAT</td>
<td class="numericalColumn">4,456.90</td>
<td class="numericalColumn">3,922.40</td>
<td class="numericalColumn">3,063.30</td>
<td class="numericalColumn">2,842.10</td>
<td class="numericalColumn">2,016.00</td>
</tr>
<tr>
<td>Non recurring items</td>
<td class="numericalColumn">441.10</td>
<td class="numericalColumn">-948.60</td>
<td class="numericalColumn">-</td>
<td class="numericalColumn">-</td>
<td class="numericalColumn">38.33</td>
</tr>
<tr>
<td>Other non cash adjustments</td>
<td class="numericalColumn">-</td>
<td class="numericalColumn">-</td>
<td class="numericalColumn">-</td>
<td class="numericalColumn">-</td>
<td class="numericalColumn">-33.85</td>
</tr>
<tr>
<td>Reported net profit</td>
<td class="numericalColumn">4,898.00</td>
<td class="numericalColumn">2,973.80</td>
<td class="numericalColumn">3,063.30</td>
<td class="numericalColumn">2,842.10</td>
<td class="numericalColumn">2,020.48</td>
</tr>
<tr>
<td>Earnigs before appropriation</td>
<td class="numericalColumn">4,898.00</td>
<td class="numericalColumn">2,973.80</td>
<td class="numericalColumn">3,063.30</td>
<td class="numericalColumn">2,842.10</td>
<td class="numericalColumn">2,020.48</td>
</tr>
<tr>
<td>Equity dividend</td>
<td class="numericalColumn">880.90</td>
<td class="numericalColumn">586.00</td>
<td class="numericalColumn">876.50</td>
<td class="numericalColumn">873.70</td>
<td class="numericalColumn">712.88</td>
</tr>
<tr>
<td>Preference dividend</td>
<td class="numericalColumn">-</td>
<td class="numericalColumn">-</td>
<td class="numericalColumn">-</td>
<td class="numericalColumn">-</td>
<td class="numericalColumn">-</td>
</tr>
<tr>
<td>Dividend tax</td>
<td class="numericalColumn">128.30</td>
<td class="numericalColumn">99.60</td>
<td class="numericalColumn">148.90</td>
<td class="numericalColumn">126.80</td>
<td class="numericalColumn">99.98</td>
</tr>
<tr>
<td>Retained earnings</td>
<td class="numericalColumn">3,888.80</td>
<td class="numericalColumn">2,288.20</td>
<td class="numericalColumn">2,037.90</td>
<td class="numericalColumn">1,841.60</td>
<td class="numericalColumn">1,207.62</td>
</tr>
</tbody>
</table>
</body>
</html>
I want to extract each value like Manufacturing Data and the values of all the years mentioned in that line. How do I go about this?
I found something like preg_match('#<tr><th>(.*)</th> <td><b>price</b></td></tr>#', $content, $match);
but that doesn't get the values I want.
If i understood you question well you want something like this to be done. this was written by me so if you need clarifications i'd love to help.
cheers !
You can use libraries like PHP Simple HTML DOM Parser to extract data from HTML/XHTML.
http://simplehtmldom.sourceforge.net/manual.htm
An example:
$pageDom = str_get_html( $rawHtmlData );
foreach( $pageDom->find( 'td' ) as $tblElem )
{
if( FALSE !== stristr( $tblElem->innertext, 'Manufacturing expenses' ) )
{
// Do stuff
}
}
精彩评论