开发者

How best to find hard-coded English language strings in SQL Server stored procedures?

We're working on making our application 100% localizable, and we're mostly there. However, we occasionally find an English string still hard-coded in stored procedures. (We're on SQL Server 2005 by the way.) We have thousands of stored procedures, so going through them by hand is impractical. I'm trying to think of the most accurate means of automating a search.

Now, I know there's no means to search for "English" strings - but searching for strings bounded by single quotes and perhaps 20+ characters long should flush MOST of them out. Good enough for our purposes now. But I'm anticipating a lot of false-positives in the comments of the stored procedures, too.

So how would you approach this? Would SMO let me tease apart the SQL in a stored procedure from the comments in it? Do I have to use OBJECT_DEFINITIO开发者_如何学PythonN() and start hacking out some terrifying regular expressions?

Much appreciated in advance, folks.


Another thought: Microsoft provides, with Visual Studio, an assembly that can parse SQL. I've used it, and it's fairly simple to use. You might be able to use it to parse the text of your stored procedures; it can return a list of the various tokens in your statements, including the type of the token. So, it should be able to help you differentiate between what is a string of text you might be interested in vs. what is part of a comment and can be ignored. There are more details here: http://blogs.msdn.com/b/gertd/archive/2008/08/21/getting-to-the-crown-jewels.aspx.

Basically, from .NET, you'd open a connection to your database and query syscomments for your stored procedures' text. You'd loop through each procedure and parse it using these parser. Then you'd use the Sql100ScriptGenerator to get the tokens out of the parsed text, loop through the tokens and look for tokens whose types are either ASCII or Unicode string literals. For those strings, check their length to see if it's 20+, and if it is, flag the strings and the procs as needing further review.

I played around with it a bit, and here is a very raw example to illustrate the basic principle:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Data;
using System.Data.SqlClient;
using Microsoft.Data.Schema;
using Microsoft.Data.Schema.ScriptDom;
using Microsoft.Data.Schema.ScriptDom.Sql;

namespace FindHardCodedStrings
{
    class Program
    {
        static void Main(string[] args)
        {
            using (SqlConnection conn = new SqlConnection())
            {
                SqlConnectionStringBuilder bldr = new SqlConnectionStringBuilder();
                bldr.DataSource = "localhost\\sqlexpress";
                bldr.InitialCatalog = "msdb";
                bldr.IntegratedSecurity = true;

                conn.ConnectionString = bldr.ConnectionString;

                SqlCommand cmd = conn.CreateCommand();
                cmd.CommandType = System.Data.CommandType.Text;
                cmd.CommandText = "select [text] from syscomments";

                SqlDataAdapter da = new SqlDataAdapter(cmd);
                DataSet ds = new DataSet();
                da.Fill(ds);

                TSql100Parser parser = new TSql100Parser(false);
                Sql100ScriptGenerator gen = new Sql100ScriptGenerator();
                gen.Options.SqlVersion = SqlVersion.Sql100;

                foreach (DataRow proc in ds.Tables[0].Rows)
                {
                    string txt = proc[0].ToString();
                    using (System.IO.TextReader sr = new System.IO.StringReader(txt))
                    {
                        IList<ParseError> errs;
                        IScriptFragment frag = parser.Parse(sr, out errs);

                        if (null == frag)
                            continue;

                        IList<TSqlParserToken> tokens = gen.GenerateTokens((TSqlFragment)frag);

                        foreach (TSqlParserToken token in tokens)
                        {
                            if (token.TokenType == TSqlTokenType.UnicodeStringLiteral || token.TokenType == TSqlTokenType.AsciiStringLiteral)
                            {
                                if (token.Text.Length >= 20)
                                    Console.WriteLine("String found: " + token.Text);
                            }
                        }
                    }
                }

            }
        }
    }
}


We resolved a problem stemming from this by creating SQL Servers in different language/region settings and running our sp's, noting which ones broke.

This may not be the most elegant solution but we were able to do it quickly because of the limited different locality settings we support.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜