开发者

preg_replace pattern for php

I got some problems with my patterns. Hope somebody could help me with this.

given a string

$string = Mutualism has been retrospectively characterised as ideologically situated between individualist and collectivist forms of anarchism.<ref>Avrich, Paul. ''Anarchist Voices: An Oral History of Anarchism in America'', Princeton University Press 1996 ISBN 0-691-04494-5, p.6<br />''Blackwell Encyclopaedia of Political Thought'', Blackwell Publishing 1991 ISBN 0-631-17944-5, p. 11.</ref> Proudhon first characterised his goal as a "third form of society, the synthesis of communism and property."&开发者_开发知识库amp;lt;ref>Pierre-Joseph Proudhon. ''What Is Property?'' Princeton, MA: Benjamin R. Tucker, 1876. p. 281.</ref> Another is <ref name=rupert/>

I want to remove strings inside the &lt;ref&gt; (<ref name='something'></ref> or <ref></ref>) or remove the single ref tag <ref name='sss' />

after replacing the final out put should be :

Mutualism has been retrospectively characterised as ideologically situated between individualist and collectivist forms of anarchism. Proudhon first characterised his goal as a "third form of society, the synthesis of communism and property." Another is

my code doesn't seem to work

$pattern1[] = "/&lt;ref[^\/]*\/&gt;/is"; //remove <ref name=something/>  
$pattern1[] = "/&lt;ref[^\/]*&gt;(.*?)&lt;\/ref&gt;/s";  //remove ref <ref>some text here</ref>
preg_replace($pattern1,"\n", $string);

instead it outputs :

Mutualism has been retrospectively characterised as ideologically situated between individualist and collectivist forms of anarchism. ''Blackwell Encyclopaedia of Political Thought'', Blackwell Publishing 1991 ISBN 0-631-17944-5, p. 11.</ref> Proudhon first characterised his goal as a "third form of society, the synthesis of communism and property." Another is

I guess it got caught up with the &lt;br /&gt;


not the most efficient, but very simple

$text=strip_tags(str_replace(array('&lt;','&gt;'),array('<','>'),$text));

strip_tags


The Problem is that your first pattern is also matching

<ref>Avrich, Paul. ''Anarchist Voices: An Oral History of Anarchism in America'', Princeton University Press 1996 ISBN 0-691-04494-5, p.6<br />

[^\/]* matches the following

>Avrich, Paul. ''Anarchist Voices: An Oral History of Anarchism in America'', Princeton University Press 1996 ISBN 0-691-04494-5, p.6<br

the solution is to use /&lt;ref(?:[^\/&]|&(?!gt;))*\/&gt;/is to match tags

in this case we use (?:[^\/&]|&(?!gt;))* instead of [^\/]*

The first (?:[^\/&]|&(?!gt;))* Matches any character excluding / and &, As the first option, or & if its not followed by gt; i.e. not part of a > symbol as the second option here the (?!gt;) is a negative look ahead assertion (see http://www.php.net/manual/en/regexp.reference.assertions.php) this simply means with out consuming the gt;, insure the next 3 character don't match this pattern.

The Second simply matches any character that's not a /.

so the following code

$str = "Mutualism has been retrospectively characterised as ideologically situated between individualist and collectivist forms of anarchism.&lt;ref&gt;Avrich, Paul. ''Anarchist Voices: An Oral History of Anarchism in America'', Princeton University Press 1996 ISBN 0-691-04494-5, p.6&lt;br /&gt;''Blackwell Encyclopaedia of Political Thought'', Blackwell Publishing 1991 ISBN 0-631-17944-5, p. 11.&lt;/ref&gt; Proudhon first characterised his goal as a &quot;third form of society, the synthesis of communism and property.&quot;&lt;ref&gt;Pierre-Joseph Proudhon. ''What Is Property?'' Princeton, MA: Benjamin R. Tucker, 1876. p. 281.&lt;/ref&gt; Another is &lt;ref name=rupert/&gt;";
$match = array(
    "/&lt;ref(?:[^\/&]|&(?!gt;))*\/&gt;/is",
    "/&lt;ref[^\/]*&gt;(.*?)&lt;\/ref&gt;/s",);
$str = preg_replace($match,'',$str);
echo $str;

outputs

Mutualism has been retrospectively characterised as ideologically situated between individualist and collectivist forms of anarchism. Proudhon first characterised his goal as a "third form of society, the synthesis of communism and property." Another is


It's not recommended to parse HTML with regex, but for this simple case you could do a:

<?php
preg_replace('/<ref.*?\/>|<ref>.*?<\/ref>/', '', $string);


I've enclosed your original string in double quotes:

$string = "Mutualism has been retrospectively characterised as ideologically situated between individualist and collectivist forms of anarchism.&lt;ref&gt;Avrich, Paul. ''Anarchist Voices: An Oral History of Anarchism in America'', Princeton University Press 1996 ISBN 0-691-04494-5, p.6&lt;br /&gt;''Blackwell Encyclopaedia of Political Thought'', Blackwell Publishing 1991 ISBN 0-631-17944-5, p. 11.&lt;/ref&gt; Proudhon first characterised his goal as a &quot;third form of society, the synthesis of communism and property.&quot;&lt;ref&gt;Pierre-Joseph Proudhon. ''What Is Property?'' Princeton, MA: Benjamin R. Tucker, 1876. p. 281.&lt;/ref&gt; Another is &lt;ref name=rupert/&gt;";

$pattern = '#&lt;ref.*?&gt;(.*?&lt;/ref&gt;)?#is';

print htmlspecialchars_decode(preg_replace($pattern, '', $string));

htmlspecialchars_decode is required to convert &quot to double quotes - omit this if you are outputting to a device that does this for you, such as a browser.

Output:

Mutualism has been retrospectively characterised as ideologically situated between individualist and collectivist forms of anarchism. Proudhon first characterised his goal as a "third form of society, the synthesis of communism and property." Another is

Notes:

I've swapped the usual / delimiter for #, which means that / can be used inside the pattern without escaping it.

.* is greedy by default. Adding the ? modifier within the pattern makes this ungreedy, which is equivalent to adding the U pattern modifier.

&lt;ref.*?&gt; matches &lt;ref followed by anything until the next &gt; is found.

.*? matches anything until the next &lt;/ref&gt;

Wrapping .*?&lt;/ref&gt; in ()? means that zero or one occurrence needs to be found. This caters for situations where there is an opening and closing tag, and where there is an opening tag with no content following it.

If you want to also match an opening tag with content following it, but no closing tag, you can change the pattern to this:

$pattern = '#&lt;ref.*?&gt;(.*?&lt;/ref&gt;|.*)#is';
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜