开发者

testing for "EndsWith" efficiently with a Regex

I need to build a Regex (.NET syntax开发者_运维问答) to determine if a string ends with a specific value. Specifically I need to test whether a file has a specific extension (or set of extensions).

The code I'm trying to fix was using:

.*\.(png|jpg|gif)$

which is hideously slow for failed matches in my scenario (presumably due to the backtracking.

Simply removing the .* (which is fine since the API only tests for matches and doesn't extract anything) at the beginning makes the regex much more efficient.

It still feels like it is pretty inefficient. Am I missing something obvious here?

Unfortunately, I don't control the API in question so I need a regex to do this even though I wouldn't normally consider regex to be the right tool for the job.

I also did some tests using the RegexOptions.RightToLeft and found that I could squeeze a little more performance out of my test case with ^.*\.(png|jpg|gif)$, but I can't find a way to specify the RightToLeft option within the string of the regex itself so I don't think I can use it.


I don't have access to C# so I can't try this... but you should be able to avoid too much backtracking by forcing the engine to find the end of the string first, then matching the extensions:

$(?<=\.(gif|png|jpg))

I'm not sure of the effect the look-behind has on performance, though.


Really, you could also just drop Regex altogether, and use String.EndsWidth, with the following :

var extensions = new String[] { ".png", ".jpg", ".gif" };
extensions.Any(ext => "something".EndsWith(ext));

I usually have the feeling that it ends up being faster to use simple string functions for cases like this rather than trying to find a clever way to use an efficient regex, in terms of runtime and/or development time, unless you are comfortable with and know what is efficient in terms of Regex.


Make it look specifically for a period instead of any character preceding the extension:

\.(png|jpg|gif)$

This will make it safer (won't match x.xgif) and it will not have to do any backtracking at all until it found a period (as opposed to backtracking on every character).


If you can change the code, why can't you use something else? You don't control the API, right, but you are changing it anyway. This I really don't understand.

Anyway, why not simply:

var AcceptedExtensions = new List<string>() { "txt", "html", "htm" };
var extension = filename.Substring(filename.LastIndexOf(".") + 1).ToLower();
return AcceptedExtensions.Contains(extension);

The IEnumerable AcceptedExtensions would be loaded from some config, the same way you load your jpg|gif|.... Or it would be a constant, whatever. You just don't need to recreate it each time you are going to use it (I doubt that this would be a bottleneck though).


You probably don't need a regular expression for this... but going with the original question:

Make sure you're using RegexOptions.Compiled to pre-compile the regular expression and then reuse your RegEx object. This avoids setting up the RegEx every time you use it, this will speed things up a lot.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜