XSD regular expression pattern in .Net causes application to hang
Processing time doubles as "Y" goes to the right. Can anybody tell me why? How to solve this problem?
I have many big ID's stored in a database those can't be changed so I can't limit the size too much.
using System;
using System.IO;
using System.Text;
using System.Xml;
using System.Xml.Schema;
namespace TestRegex
{
class Program
{
static void Main(string[] args)
{
DateTime start = DateTime.Now;
/******************************************
* ID to validate
******************************************/
//string id = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxx"; // Ok: Fast
string id = "xxxxxxxxxxxxxxxxxxxxxYxxxxxxx"; // Invalid: Slow
//string id = "xxxxxxxxxxxxxxxxxxxxxxYxxxxxx"; // Invalid: Slower
//string id = "xxxxxxxxxxxxxxxxxxxxxxxYxxxxx"; // Invalid: Very slow
//string id = "xxxxxxxxxxxxxxxxxxxxxxxxYxxxx"; // Invalid: Very very slow
/******************************************
* XML to validate
******************************************/
XmlDocument doc = new XmlDocument();
doc.LoadXml("<root id='" + id + "'></root>");
/******************************************
* XSD validator
*****************开发者_如何学Python*************************/
string xsl =
@"
<xs:schema xmlns:xs='http://www.w3.org/2001/XMLSchema'
elementFormDefault='unqualified'
attributeFormDefault='unqualified'>
<xs:simpleType name='id'>
<xs:restriction base='xs:string'>
<xs:pattern value='^([a-z_]+[0-9]*)+' />
</xs:restriction>
</xs:simpleType>
<xs:element name='root'>
<xs:complexType>
<xs:attribute name='id' use='required' type='id' />
</xs:complexType>
</xs:element>
</xs:schema>
";
/******************************************
* Adds XSD to XML and validates it
******************************************/
XmlTextReader reader = new XmlTextReader(
new MemoryStream(ASCIIEncoding.Default.GetBytes(xsl)));
XmlSchema schema = XmlSchema.Read(reader, new ValidationEventHandler(Validate));
doc.Schemas.Add(schema);
doc.Validate(new ValidationEventHandler(Validate));
/******************************************
* Performance results
******************************************/
Console.WriteLine(id.Length + " = " + (DateTime.Now - start).TotalSeconds);
Console.Read();
}
private static void Validate(object o, ValidationEventArgs args)
{
if (args.Exception != null)
{
Console.WriteLine(args.Exception);
}
}
}
}
This looks like a case of a Catastrophic Backtracking.
Your regex seems overly complex. If I'm reading it correctly it accepts lower case and numbers, when the first letter isn't a number. You can rewrite it as:
^[a-z_]\w*
Solved!
The regex ^([a-z_][a-z_0-9]*)
has the same behavior and it's extremely faster.
精彩评论