regex in php remove citation from wiki text
From the given sample text i want the text apart from the ones that are contained in [[]] and {{}}
Sample Text:
On 11 December 1988, aged just 15 years and 232 days, Tendulkar scored 100 not out in his debut [[first-class cricket|first-class]] match for [[Mumbai cricket team|Bombay]] against [[Gujarat cricket team|Gujarat]], making him the youngest Indian to score a century on first-class debut. He followed this by scoring a century in his first Deodhar and Duleep Trophy. {{cite web|url=http://www.espnstar.com/cricket/international-cricket/news/detail/item136972/Sachin-Tendulkar-factfile/|title=Sachin Tendulkar factfile |publisher=www.espnstar.com|accessdate=3 August 2009}} He was picked by the Mumbai captain [[Dilip Vengsarkar]] after seeing him negotiate [[Kapil Dev]] in the nets, and finished the season as Bombay's highest run-scorer.He scored 583 runs at an average of 67.77, and was the sixth highest run-scorer overall{{cite web|url=http://blogs.cricinfo.com/link_to_database/ARCHIVE/1980S/1988-89/IND_LOCAL/RANJI/STATS/IND_LOCAL_RJI_AVS_BAT_MOST_RUNS.html|title=1988–89 Ranji season – Most Runs|publisher=Cricinfo|accessdate=3 August 2009}} He also made an unbeaten century in the [[Irani Trophy]] final,{{cite web|url=http://cricketarch开发者_JAVA百科ive.com/Archive/Scorecards/52/52008.html|title=Rest of India v Delhi in 1989/90 |publisher=Cricketarchive|accessdate=3 August 2009}} and was selected for the tour of Pakistan next year, after just one first class season.
I tried this:
$patterns = ("/^{{*/", "/*}}$/" );$replacements = "";
preg_replace($patterns, $replacements, $parts);
print_r($parts);
and this:
$parts = preg_replace("/\[(?:\\\\|\\\]|[^\]])*\]/", "", $ans_str);
and this too:
$pattern = ("/\[.*?\]/", "/\{.*?\}/");
$ans = preg_replace($pattern, "", $parts);
It does not work. Please help, thanks.
This should do the trick
$str = "On 11 December 1988, ...";
$str = preg_replace('/\{\{.+\}\}/Us', '', $str);
var_dump($str);
U modifier is for ungreedy mode, which means stop the match as soon as possible (to avoid all citations being caught as one giant match).
EDIT: added the s modifier, see comments
// remove `{{cite}}` tags
$str = preg_replace('/\s*\{\{[^}{]*+\}\}\s*/', ' ', $str);
// remove links--including rollover text--leaving link text
$str = preg_replace('/\[\[(?:[^][|]*+\|)?+([^][]*+)\]\]/', '$1', $str);
see demo on ideone.com
the following two lines did the trick :
$str = preg_replace(/\s*\{{.*?\}}\s*/g, " ", $str);//to remove the curly braces and the text between them.
$str = preg_replace(/[\[(.)\]]/g, "", $str);//to remove the square braces.
Sorry it went wrong.
精彩评论