开发者

Regular Expression: How to replace a string that does NOT start with something?

I need to replace a root relative URL with a different root relative URL:

/Images/filename.jpg

should be replaced with:

/new/images-dir/filename.jpg

I started by using PHP's str_replace function:

$newText = str_replace('/Images/', '/new/images-dir/', $text);

...but then I realized that it was replacing my absolute URLs that I don't want replaced:

http://sub.domain.com/something/Images/filename.jpg
#...is being replaced with...
http://sub.domain.com/something/new/images-dir/filename.jpg

So then I switched to using PHP's preg_replace function so I can use a regular expression to selectively replace only the root relative URLs and not the absolute URLs. However, I can't seem to figure out the syntax to do this:

$text = 'There is a root relative URL here: <img src="/Images/filename.jpg">'
      . 'and an absolute her开发者_JS百科e: <img src="http://sub.domain.com/something/Images/filename.jpg">'
      . 'and one not in quotes: /Images/filename.jpg';
$newText = preg_replace('#/Images/#', '/new/images-dir/', $text);

How can I write my regular expression so that it ignores any absolute URLs and only replaces the root relative URLs?


After taking three edits to come up with a correct regex, I concluded that my first answer was best. PHP's string functions are better suited than regular expressions for this task:

Using str_replace():

function match($value)
{
   // The second condition is probably unnecessary,
   // unless your path argument is incorrectly formatted
   if( ($value[0] != "/") || (stristr($value, "http:") != FALSE) )
   {
      return $value;
   }
   return str_replace("/Images/", "/new/images-dir/", $value);
}

The advantage of str_replace() is readability.

If the reader doesn't understand regular expressions, they can still clearly see criteria for matching: the input string must begin with '/' and must not contain "http:".

Furthermore, both the search key and replacement string are clearly represented in plain-text.

Using preg_replace():

function match($value)
{
   $pattern = "/^(\/((.+?)\/)*?)Images\//";

   // Assuming value is a root-relative path, everything
   // before "Images/" should be capured into back-reference 1;
   // The replacement string re-inserts it before "new/images-dir/"
   return preg_replace($pattern, "\\1new/images-dir/", $value);
}

The regular expression tries to match following:

  1. Match the beginning of string with ^,
  2. followed by a forward slash to indicate root-relative URL,
  3. followed by zero-or-more lazily quantified repetitions of the group ((.+?)/). This group consists of one-or-more lazily quantified characters, and another forward-slash.
  4. Match subsequent string "Images" and final forward-slash.

Both match() functions operate the same when tested as follows:

match("http://test/more/Images/file"); // Returns original argument
match("/test/more/Images/file");       // Returns with match replaced


According to the PHP documentation on Lookbehind assertions:

Lookbehind assertions start with (?<= for positive assertions and (?<! for negative assertions.

Using this syntax, I was able to get this to work:

$text = preg_replace('#(?<!http\://sub.domain.com/something)/Images/#', '/new/images-dir/', $text);


Root-relative links generally are within quotes, as you've shown. So match on the quote and put it back in the replacement.

$text = 'There is a root relative image here: <img src="/Images/filename.jpg">';
$newText = preg_replace('#"/Images/#', '"/new/images-dir/', $text);

Update

If you have two different cases, try two different and specific replaces rather than trying to engineer one perfect one. Let us know what the other case(s) are.

If you need to match more than that, then you are looking for a "negative lookbehind assertion" so you make sure that it doesn't match the "http://blah" part before it. The problem with lookbehind is that it requires a static string match... it can't have variable length. http://www.php.net/manual/en/regexp.reference.assertions.php

Something like this might work, if you mostly use links to .net and .com links and the Images part is at the root:

$text = 'There is a root relative image here: <img src="/Images/filename.jpg">';
$newText = preg_replace('#(?<=.net|.com|.org|.cc)/Images/#', '/new/images-dir/', $text);
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜