Finding the index of the outer FROM clause in a SQL statement string
I need to find the index of the actual outer 'from' in a SQL string. Here are some examples of SQL strings I would be looking for the actual FROM:
select * from table --easy
select *, (select top 1 col1 from anothertable) as col1 from table
select * from table, (select col1 from anothertable) as anothertable
select * from table where col1=(select top 1 col1 from anothertable)
I'm sure there are many more valid SQL statements that utilize sub selects. I think what I need is a regex or开发者_如何学编程 parser that knows the difference between the most outer 'from' and skips any sub 'froms'. Of course I may not be considering other pitfalls to finding the outer 'from' so any input would be appreciated.
You can use Microsoft.Data.Schema.ScriptDom
for this. A rough around the edges proof of concept application is below.
using System;
using System.Collections.Generic;
using System.IO;
using Microsoft.Data.Schema.ScriptDom;
using Microsoft.Data.Schema.ScriptDom.Sql;
namespace ConsoleApplication1
{
internal class Program
{
public static void Main(string[] args)
{
string[] queries =
{
@"select * from table1;",
/*Test multiline*/
@"select *,
(select top 1 col1 from anothertable) as col1 from table1;"
,
@"select * from table1, (select col1 from anothertable) as anothertable;",
@"select * from table1 where col1=(select top 1 col1 from anothertable)",
/*Test invalid syntax ("table" is a reserved word)*/
@"select * from table where col1=(select top 1 col1 from anothertable)"
};
foreach (string query in queries)
{
Parse(query);
}
Console.ReadKey();
}
private static void Parse(string query)
{
Console.WriteLine(@"------------------------------------------");
Console.WriteLine(@"Parsing statement ""{0}""", query);
var parser = new TSql100Parser(false);
IList<ParseError> errors;
IScriptFragment result = parser.Parse(new StringReader(query), out errors);
if (errors.Count > 0)
{
Console.WriteLine(@"Errors encountered: ""{0}""", errors[0].Message);
return;
}
TSqlStatement statement = ((TSqlScript) result).Batches[0].Statements[0];
if (statement is SelectStatement)
{
TableSource tableSource = (((QuerySpecification)((SelectStatement)statement).QueryExpression).FromClauses[0]);
Console.WriteLine(@"Top level FROM clause at Line {0}, Column {1} (Character Offset {2})",
tableSource.StartLine, tableSource.StartColumn, tableSource.StartOffset);
Console.WriteLine();
Console.WriteLine();
}
}
}
}
Output
------------------------------------------
Parsing statement "select * from table1;"
Top level FROM clause at Line 1, Column 15 (Character Offset 14)
------------------------------------------
Parsing statement "select *,
(select top 1 col1 from anothertable) as col1 from
table1;"
Top level FROM clause at Line 2, Column 82 (Character Offset 93)
------------------------------------------
Parsing statement "select * from table1, (select col1 from anothertable) as ano
thertable;"
Top level FROM clause at Line 1, Column 15 (Character Offset 14)
------------------------------------------
Parsing statement "select * from table1 where col1=(select top 1 col1 from anoth
ertable)"
Top level FROM clause at Line 1, Column 15 (Character Offset 14)
------------------------------------------
Parsing statement "select * from table where col1=(select top 1 col1 from anothe
rtable)"
Errors encountered: "Incorrect syntax near table."
Assuming that the "from" that you want to find is the only one outside of any brackets, you could go through the string letter by letter and count the brackets (+1 when opening, -1 when closing). When you find a "from" while your bracket-counter is 0, it should be the correct one.
Take into account, that from may occure in the select clause as fieldname, alias or name of an SP, as well. E.g select from select from TABLE is valid, if TABLE has field from.
精彩评论