开发者

Regular Expression Not Working

Greetings everyone

I have this regular expression which goes as follow:

$thread_views_exp = '~<td class="alt1" align="center">.*</td> <td class="alt2" align="center">(.*)</td> </tr>~isU';

The purpose of this is to get all the 'views' ( first column from left ) for this sample thread url http://www.swalif.net/softs/swalif45. Everything works fine except for the first value.

Sample Output:

Array
(
    [0] => 12 528
    [1] => 2,732
    [2] => 506
    [3] => 73
    [4] => 83
    [5] => 245
    [6] => 100
    [7] => 201
    [8] => 55
    [9] => 55
    [10] => 37
    [11] => 349
    [12] => 123
    [13] => 75
    [14] => 173
    [15] => 260
    [16] => 101
    [17] => 660
    [18] => 158
    [19] => 66
    [20] => 177
    [21] => 165
    [22] => 228
    [开发者_StackOverflow23] => 812
    [24] => 347
    [25] => 197
    [26] => 348
    [27] => 263
    [28] => 176
    [29] => 315
    [30] => 173
    [31] => 273
    [32] => 199
)

Thanks for your assistance. Imran


It seems to be a case of table cell greedyness. My test also gave me an extraneous <td>. But there is a simple way to make the regex more stringent:

$rx = '~<td class="alt1" align="center">.*</td> <td class="alt2" align="center">([\d,]+)</td> </tr>~isU';

Here the \d+ used in place of .*? returns only exact matches. The previous .* was eating up too much.

General tip: you might want to use [^<>]* for safely matching text content between html brackets, instead of .*. Maybe apply \s+ instead of just spaces.


Maybe try

~<td class="alt2" [^\<\>]+?>([\d,]+)</td>~isU

This assumes that the tds you are interested in are always of class="alt2"

And there's probably no need to escape the LT and GT signs ie...

~<td class="alt2" [^<>]+?>([\d,]+)</td>~isU
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜