A preg_replace puzzle: replacing zero or more of a char at the end of the subject

2023-01-10 05:40 问答作者：

Say $d is a directory path and I want to ensure that it starts and ends with exactly one slash (/). It may initially have zero, one or more leading and/or trailing slashes.

I tried:

preg_replace('%^/*|/*$', '/', $d);

which works for the leading slash but to my surprise yields two trailing slashes if $d has at least one trailing slash. If the subject is, e.g., 'foo///' then preg_replace() first matches and replaces the three trailing slashes with one slash and then it matches zero slashes at the end and replaces that with with a slash. (You can verify this by replacing the second argument with '[$0]'.) I find this rather counterintuitive.

While there are many other ways to solve the underlying problem (and I implemented one) this became a PCRE puzzle for me: what (scalar) pattern in a single preg_replace does this job?

ADDITIONAL QUESTION (edit)

Can anyone explain why this pattern matches the way it does a开发者_运维问答t the end of the string but does not behave similarly at the start?

$path = '/' . trim($path, '/') . '/';

This first removes all slashes at beginning or end and then adds single ones again.

Given a regex like /* that can legitimately match zero characters, the regex engine has to make sure that it never matches more than once in the same spot, or it would get stuck in an infinite loop. Thus, if it does consume zero characters, the engine jumps forward one position before attempting another match. As far as I know, that's the only situation in which the regex engine does anything on its own initiative.

What you're seeing is the opposite situation: the regex consumes one or more characters, then on the next go-round it tries to start matching at the spot where it left off. Never mind that this particular regex can't match anything but the one character, and it already matched as many of those as it could; it still has the option of matching nothing, so that's what it does.

So, why doesn't your regex match twice at the beginning, like it does at the end? Because of the start anchor (^). If the subject starts with one or more slashes, it consumes them and then tries to match zero slashes, but it fails because it's not at the beginning of the string any more. And if there are no slashes at the beginning, the manual bump-along has the same affect.

At the end of the subject it's a different story. If there are no slashes there, it matches nothing, tries to bump along and fails; end of story. But if it does match one or more slashes, it consumes them and tries to match again--and succeeds because the $ anchor still matches.

So in general, if you want to prevent this kind of double match, you can either add a condition to the beginning of the match to prevent it, like the ^ anchor does for the first alternative:

preg_replace('%^/*|(?<!/)/*$%', '/', $d);

...or make sure that part of the regex has to consume at least one character:

preg_replace('%^/*|([^/])/*$%', '$1/', $d);

But in this case you have a much simpler option, as demonstrated by John Kugelman: just capture the part you want to keep and chuck the rest.

preg_replace('%^/*(.*?)/*$%', '/\1/', $d)

it can be done in a single preg_replace

preg_replace('/^\/{2,}|\/{2,}$|^([^\/])|([^\/])$/', '\2/\1', $d);

A small change to your pattern would be to separate out the two key concerns at the end of the string:

Replace multiple slashes with one slash
Replace no slashes with one slash

A pattern for that (and the existing part for matching at the start of the string) would look like:

#^/*|/+$|$(?<!/)#

A slightly less concise, but more precise, option would be to be very explicit about only matching zero or two-or-more slashes; the notion being, why replace one slash with one slash?

#^(?!/)|^/{2,}|/{2,}$|$(?<!/)#

^{Aside: nikic's suggestion to use trim (to remove leading/trailing slashes, then add your own) is a good one.}

继续阅读：pcre php preg-replace regex

A preg_replace puzzle: replacing zero or more of a char at the end of the subject

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？