Regex to match multiple strings with positive look behind
So I have been trying to combine the answers of these two questions:
C# split string but keep split chars\seperators Regex to match multiple stringsEssentially I'd like to be able to split a string around certain strings and have the splitting strings in the output array of 开发者_StackOverflowRegex.Split()
as well. Here is what I have tried so far:
// ** I'd also like to have UNION ALL but not sure how to add that
private const string CompoundSelectRegEx = @"(?<=[\b(UNION|INTERSECT|EXCEPT)\b])";
string sql = "SELECT TOP 5 * FROM Persons UNION SELECT TOP 5 * FROM Persons INTERSECT SELECT TOP 5 * FROM Persons EXCEPT SELECT TOP 5 * FROM Persons";
string[] strings = Regex.Split(sql, CompoundSelectRegEx);
The problem is that it starts matching individual characters like E and U so I get an incorrect array of strings.
I'd also like to match around UNION ALL but since thats not just a single word but a string I wasn't sure how to add it the above regex so if someone could point me in the right direction there as well that would be great!
Thanks!
If you want to split on those words and include them in the results simply alternate on them and place them in a group. There's no need for look-arounds. This pattern should fit your needs:
string pattern = @"\b(UNION(?:\sALL)?|INTERSECT|EXCEPT)\b";
The (?:\sALL)?
makes the word ALL
optionally matched. The (?:...)
part means match but don't capture the specified pattern. The trailing ?
at the end of the group makes it optional. If you want to trim the results you could add a \s*
at the end of the pattern.
Be aware that this might work for simple SQL statements, but once you start dealing with nested queries the above approach will probably break down. At that point a regex might not be the best solution and you should develop a parser instead.
精彩评论