Wikipedia links regex in PHP
How can I draw only the words in [[words]] into array?
[[旭川市|旭川]](文化) - [[アイヌ]]文化、[[旭川市旭山動物園|旭山動物園]]など
I tried \[\[.*]]
but it didn't work, maybe it is because .*开发者_JAVA技巧
is only for English strings..
preg_match_all('/\[\[(.+?)\]\]/u',$str,$matches);
var_dump($matches);
You can encode the Unicode first:
[旭川市旭山動物園|旭山動物園]]などl]
You need to backslash both sides, all the square brackets need to be escaped.
This worked in Python, may need modification for PHP:
>>> re.compile('\[\[(.*?)\]\]')
<_sre.SRE_Pattern object at 0xb747ebf0>
>>> r=_
>>> r.search(text)
<_sre.SRE_Match object at 0xb7469560>
>>> r.findall(text)
['\xe6\x97\xad\xe5\xb7\x9d\xe5\xb8\x82|\xe6\x97\xad\xe5\xb7\x9d', '\xe3\x82\xa2\xe3\x82\xa4\xe3\x83\x8c', '\xe6\x97\xad\xe5\xb7\x9d\xe5\xb8\x82\xe6\x97\xad\xe5\xb1\xb1\xe5\x8b\x95\xe7\x89\xa9\xe5\x9c\x92|\xe6\x97\xad\xe5\xb1\xb1\xe5\x8b\x95\xe7\x89\xa9\xe5\x9c\x92']
Hmm, maybe I'm wrong about having to escape the right-square brackets, turned out it wasn't necessary in Python.
One problem is that you're using the greedy wildcard: \[\[.*]]
will match from the first [[
to the last ]]
, including any intervening ]]
.
Most regex engines now also include a nongreedy wildcard, typically *?
so \[\[.*?]]
would just match one wikilink at a time.
精彩评论