What does this Regex string mean?

2023-02-08 07:40 问答作者：

I'm trying to debug some PHP but I am not so hot on my regex, can someone please translate this for me? (if even it is regex)

public static function fetch($number)
    {
        $number = str_replace(" ", "", $number);
        $html = file_get_contents('http://w2.brreg.no/enhet/sok/detalj.jsp?orgnr=' . $number);
        preg_match_all('/\<td style="width.*\<b\>(.*)[: ]*\<\/b\>/msU', $html, $keys);
        preg_match_all('/\<\/b\>.*\<td.*\>(.*)\<\开发者_如何转开发/td\>/msU', $html, $values);

        if (!$keys[1])
        {
            return null;
        }

Kept the PHP snippet for context, if it helps :D Thanks :)

I'm only translating the first one, the second one is similar.

/                  # regex delimiter
\<td style="width  # match <td style="width  (unnecessary escaping of < !)
.*                 # match anything (as few characters as possible, see below)
\<b\>              # match <b> (again, unnecessary escaping!)
(.*)               # match anything (lazily) and capture it
[: ]*              # match any number of colons or spaces
\<\/b\>            # match </b>
/msU               # regex delimiter; multiline option (unnecessary), 
                   # dot-all option (dot matches newline) 
                   # and ungreedy option (quantifiers are lazy by default).

EDIT: U is not the Unicode option, but the ungreedy option. My mistake. The regex isn't that bad after all :)

I'd suggest using these regexes instead:

/<td style="width.*?<b>(.*?)[: ]*<\/b>/s
/<\/b>.*?<td.*?>(.*?)<\/td>/s

More or less, it returns the {extracted} part from <td style="width ..."><b>{extracted}: </b>

To help understand regular expressions I recommend downloading Expresso (for Windows) which is a free (but registration required) expression parser and testing tool.

I believe its trying to match the following structure:

<td width=.....><b>key:</b></td><td>value</td>

Its parsing the string twice, once for keys, which are taken from the first column, and a second time for values, which are taken from the second column.

I you want an advice, your regex may won't work as expected. In your case, it's better to use xpath.

See this snippet :

$str = "
<html>
    <body>
        <table>
        <tr>
            <td style='width:500px'><b>foo : </b> bar</td>
            <td style='width:200;vertical-align:'><b>baz :</b> qux</td>
        </tr>
        </table>
    </body>
</html>
";

$xml = simplexml_load_string($str);

$results = array();
foreach($xml->xpath('//td[@style][b]') as $row) {
    $value = trim(sprintf("%s", $row));
    $key = trim((string)$row->b, ' :');
    $results[$key] = $value;
}

var_dump($results);

Will prints

array(2) {
  ["foo"]=>
  string(3) "bar"
  ["baz"]=>
  string(3) "qux"
}

继续阅读：php regex

What does this Regex string mean?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？