开发者

Most efficient way to query multiple identical tables in separate databases

I have a server (SQL Server 2005) with multiple archive databases (1 per quarter stretching back 8 years) that are all structurally identical.

I frequently need to query back over a certain date range that spans n databases, usually开发者_StackOverflow中文版 n is small 1-3 but it's possible I need to query the whole set.

Any thoughts n the most efficient way to do this both from a code cleanliness and a performance perspective?

Current solutions are rather ad-hoc, there are a collection of views that span all or just the most recent database, other solutions are to generate dynamic SQL that figures out which DB(s) contain the data sought.

Obviously the ideal solution would be to partition the tables but I can't do this because it's a 3rd party supplied database

Dave

EDIT: I can't combine the databases since they are 3rd party controlled, the total data size is about 50GB so not huge, the largest tables contain around 1.5m rows per quarter

EDIT2: A data warehouse is definitely the right solution long term (it's in the plan) but I can't do this today :(


One way to do this: use sp_msForEachDb.

-- Round 1 -------

Call this system procedure with a varchar parameter. (It's actually a LOT messier than this, check the code in the master database if you want to know what it's really doing.) The parameter must be a chunk of dynamic code -- for example,

DECLARE @DemoParameter varchar(1000)
SET @DemoParameter = 'SELECT MyCol from MyTable where CreatedOn between ''Jan 1, 1980'' and ''Dec 21, 2012'''
EXECUTE sp_msForEachDb @DemoParameter

This would run the query against every database on the SQL instance, returning one set per database -- except for those databases that didn't have the necessary table(s), which would throw an error (particularly the system databases). This leads us to...

-- Round 2 ---------

Within the dynamic code, as databases are iterated over all instances of the question mark ? will be replaced with the name of the currently being processed database. You can use this to filter which databases are to be processed, and which aren't. Note also that the "current" database will not be changed by the routine, you have to do that yourself. This gives us code like:

SET @DemoParameter = '

IF ''?'' like  ''%Foo%''
 BEGIN
    USE ?
    SELECT MyCol from MyTable where CreatedOn between ''Jan 1, 1980'' and ''Dec 21, 2012''

'

This would run the query only against those databases whose names contain the characters "foo". Possibly you could check for the presence of the table within each database; other methods suggest themselves.

This will shotgun back one dataset for each database, which doesn't help too much if you need them all in one neat and orderly data set, and that gets us to...

-- Round 3 ------------

Briefly: create a temp table, and populate it from within the dynamic query. As I show below, you can include the name of the database, and ever the server name -- very useful when your questing for lost data across dozens of databases spread across a handful of servers.

Create (or clear) the temp table:

IF object_id('tempdb.dbo.##Foo') is null
    CREATE TABLE ##Foo
     (
       ServerName         varchar(100)  not null
      ,DBName             varchar(100)  not null

      --  Add your own columns here
      ,MyCol              int  not null
     )

ELSE
    --Option: Delete this line to not clear on each run
    TRUNCATE TABLE ##Foo

Run the code (this is my main template, you can easily work @DemoParameter back in there):

EXECUTE sp_msForEachDB '
IF ''?'' like  ''%Foo%''
 BEGIN
    USE ?

    INSERT ##Foo
     select @@servername, db_name()
       ,MyCol
      from MyTable
 END
'

...and that should produce a single temp table with your data. Test this out, I wrote this without actually testing the code, and typso will silp in. (#temp tables should work as well as ##temp, I generally do this with ad-hoc system support issues)


Here's something that's going to do it!

Declare
@Database varchar(8000),
@Sql varchar(8000)
BEGIN Declare DBName Cursor LOCAL FAST_FORWARD For Select name FROM sys.databases where name like 'Your_DB_Names%'

Open DBName WHILE (1=1) Begin Fetch Next From DBName into @Database

if @@Fetch_status = -1 Break
if @@Fetch_status = -2 Continue

Set @Sql = 'use '+@Database Print @Sql Execute (@Sql)

SELECT * FROM TABLE -- Your query here End Close DBName
Deallocate DBName END


I've done this often, and let me tell you, keeping separate databases is a PAIN IN THE ASS. It forces you to do all sorts of logic like this all over the place - it sort of breaks the encapsulation that is a database in the first place.

What you are looking at is a data warehouse. You should look into consolidating all of your databases into one, and making it read-only. Then you take nightly/hourly incremental backups of your live data, and restore that against your warehouse. Then your warehouse is always up to date, and you run your reporting against that, instead of the live data.

This has the upshot of keeping your reports from killing your live production databases, and I'd guess upwards of 90% of business needs don't require 100% accurate just-in-time numbers.

Do the hard stuff once - create a warehouse. :-)

EDIT

Something I've done in the past is to create a view of the tables I use, and using linked databases (if the dbs were on other machines)

Create view view_tale as 
    select * from activedb.dbo.table
    union
    select * from db1.dbo.table 
    union 
    select * from db2.dbo.table 

Hideous, performance-wise, but solves the problem neatly. Then you still only have the one-time setup issue (creating a view per table you wsish to query), and a centralized place to modify to keep your database list up to date for ongoing maintenance, as opposed to keeping N number of reports up to date.


Danielle's answer worked for me, with a slight change below. We have dozens of development databases on our servers for all our clients, in groups named with regular prefixes and suffixes. Using a cursor makes it possible to get a look at all the records in all the databases of a certain group. I had to make the change, though, because "execute" wasn't working for the "use" command, so I just made the entire command with the database name.

Declare
@Database varchar(8000),
@Sql varchar(8000)
BEGIN Declare DBName Cursor LOCAL FAST_FORWARD For Select name FROM sys.databases where name like 'MyPrefix%MySuffix'

Open DBName WHILE (1=1) Begin Fetch Next From DBName into @Database

if @@Fetch_status = -1 Break
if @@Fetch_status = -2 Continue

set @Sql = 'select * from '+@Database+'MyTable'
print @sql
execute (@sql)
End
Close DBName
Deallocate DBName
END


Depending on the size of the databases, it might actually be better to consolidate them into one database and properly index them.

You can write your own SSIS package and schedule it to consolidate data periodically (daily / hourly / etc).

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜