Why Compiled RegEx performance is slower than Intrepreted RegEx?
I run into this article:
Performance: Compiled vs. Interpreted Regular Expressions, I modified the sample code to compile 1000 Regex and then run each 500 times to take advantage of precompilation, however even in that case interpreted RegExes run 4 times faster!
This means Big difference was due to JIT, after solving JIT compiled regex in the the following code still performs a little bit slow and doesn't make sense to me but @Jim in the answers provided a much cleaner version which works as expected.RegexOptions.Compiled
option is completely useless, actually even worse, it's slower!
Can anyone explain why this is the case?
Code, taken & modified from the blog post:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
namespace RegExTester
{
class Program
{
static void Main(string[] args)
{
DateTime startTime = DateTime.Now;
for (int i = 0; i < 1000; i++)
{
CheckForMatches("some random text with email address, address@domain200.com" + i.ToString());
}
double msTaken = DateTime.Now.Subtract(startTime).TotalMilliseconds;
Console.WriteLine("Full Run: " + msTaken);
startTime = DateTime.Now;
for (int i = 0; i < 1000; i++)
{
CheckForMatches("some random text with email address, address@domain200.com" + i.ToString());
}
msTaken = DateTime.Now.Subtract(startTime).TotalMilliseconds;
Console.WriteLine("Full Run: " + msTaken);
Consol开发者_JAVA百科e.ReadLine();
}
private static List<Regex> _expressions;
private static object _SyncRoot = new object();
private static List<Regex> GetExpressions()
{
if (_expressions != null)
return _expressions;
lock (_SyncRoot)
{
if (_expressions == null)
{
DateTime startTime = DateTime.Now;
List<Regex> tempExpressions = new List<Regex>();
string regExPattern =
@"^[a-zA-Z0-9]+[a-zA-Z0-9._%-]*@{0}$";
for (int i = 0; i < 2000; i++)
{
tempExpressions.Add(new Regex(
string.Format(regExPattern,
Regex.Escape("domain" + i.ToString() + "." +
(i % 3 == 0 ? ".com" : ".net"))),
RegexOptions.IgnoreCase));// | RegexOptions.Compiled
}
_expressions = new List<Regex>(tempExpressions);
DateTime endTime = DateTime.Now;
double msTaken = endTime.Subtract(startTime).TotalMilliseconds;
Console.WriteLine("Init:" + msTaken);
}
}
return _expressions;
}
static List<Regex> expressions = GetExpressions();
private static void CheckForMatches(string text)
{
DateTime startTime = DateTime.Now;
foreach (Regex e in expressions)
{
bool isMatch = e.IsMatch(text);
}
DateTime endTime = DateTime.Now;
//double msTaken = endTime.Subtract(startTime).TotalMilliseconds;
//Console.WriteLine("Run: " + msTaken);
}
}
}
Compiled regular expressions match faster when used as intended. As others have pointed out, the idea is to compile them once and use them many times. The construction and initialization time are amortized out over those many runs.
I created a much simpler test that will show you that compiled regular expressions are unquestionably faster than not compiled.
const int NumIterations = 1000;
const string TestString = "some random text with email address, address@domain200.com";
const string Pattern = "^[a-zA-Z0-9]+[a-zA-Z0-9._%-]*@domain0\\.\\.com$";
private static Regex NormalRegex = new Regex(Pattern, RegexOptions.IgnoreCase);
private static Regex CompiledRegex = new Regex(Pattern, RegexOptions.IgnoreCase | RegexOptions.Compiled);
private static Regex DummyRegex = new Regex("^.$");
static void Main(string[] args)
{
var DoTest = new Action<string, Regex, int>((s, r, count) =>
{
Console.Write("Testing {0} ... ", s);
Stopwatch sw = Stopwatch.StartNew();
for (int i = 0; i < count; ++i)
{
bool isMatch = r.IsMatch(TestString + i.ToString());
}
sw.Stop();
Console.WriteLine("{0:N0} ms", sw.ElapsedMilliseconds);
});
// Make sure that DoTest is JITed
DoTest("Dummy", DummyRegex, 1);
DoTest("Normal first time", NormalRegex, 1);
DoTest("Normal Regex", NormalRegex, NumIterations);
DoTest("Compiled first time", CompiledRegex, 1);
DoTest("Compiled", CompiledRegex, NumIterations);
Console.WriteLine();
Console.Write("Done. Press Enter:");
Console.ReadLine();
}
Setting NumIterations
to 500 gives me this:
Testing Dummy ... 0 ms
Testing Normal first time ... 0 ms
Testing Normal Regex ... 1 ms
Testing Compiled first time ... 13 ms
Testing Compiled ... 1 ms
With 5 million iterations, I get:
Testing Dummy ... 0 ms
Testing Normal first time ... 0 ms
Testing Normal Regex ... 17,232 ms
Testing Compiled first time ... 17 ms
Testing Compiled ... 15,299 ms
Here you see that the compiled regular expression is at least 10% faster than the not compiled version.
It's interesting to note that if you remove the RegexOptions.IgnoreCase
from your regular expression, the results from 5 million iterations are even more striking:
Testing Dummy ... 0 ms
Testing Normal first time ... 0 ms
Testing Normal Regex ... 12,869 ms
Testing Compiled first time ... 14 ms
Testing Compiled ... 8,332 ms
Here, the compiled regular expression is 35% faster than the not compiled regular expression.
In my opinion, the blog post you reference is simply a flawed test.
http://www.codinghorror.com/blog/2005/03/to-compile-or-not-to-compile.html
Compiled helps only if you instantiate it once and re-use it multiple times. If you're creating a compiled regex in the for loop then it obviously will perform worse. Can you show us your sample code?
The problem with this benchmark is that compiled Regexes have the overhead of creating a whole new assembly and loading it into the AppDomain.
The scenario where compiled Regex was designed for (I believe -- I didn't design them) is having tens of Regexes executed millions of times, not thousands of Regexes executed thousands of times. If you're not going to execute a Regex in the realm of a million times, you probably won't even make up for the time to JIT compile it.
This is almost certainly an indication that your benchmark code is written incorrectly vs. compiled regex's being slower than interpreted ones. There is a lot of work that went into making compiled regex's performant.
Now that we have the code can look at a few specific things that need updating
- This code doesn't account for the JIT costs of the method. It should run the code once to get the JIT costs out of the way and then run it again and measure
- Why is
lock
used at all? It's completely unnecessary - Benchmarks should use
StopWatch
notDateTime
- To get a good comparison between Compiled and not compiled you should test the performance of a single compiled
Regex
and single non-compiledRegex
matching N times. Not N of each matching at most once per regex.
精彩评论