Most efficient way to query multiple identical tables in separate databases
I have a server (SQL Server 2005) with multiple archive databases (1 per quarter stretching back 8 years) that are all structurally identical.
I frequently need to query back over a certain date range that spans n databases, usually开发者_StackOverflow中文版 n is small 1-3 but it's possible I need to query the whole set.
Any thoughts n the most efficient way to do this both from a code cleanliness and a performance perspective?
Current solutions are rather ad-hoc, there are a collection of views that span all or just the most recent database, other solutions are to generate dynamic SQL that figures out which DB(s) contain the data sought.
Obviously the ideal solution would be to partition the tables but I can't do this because it's a 3rd party supplied database
Dave
EDIT: I can't combine the databases since they are 3rd party controlled, the total data size is about 50GB so not huge, the largest tables contain around 1.5m rows per quarter
EDIT2: A data warehouse is definitely the right solution long term (it's in the plan) but I can't do this today :(
One way to do this: use sp_msForEachDb.
-- Round 1 -------
Call this system procedure with a varchar parameter. (It's actually a LOT messier than this, check the code in the master database if you want to know what it's really doing.) The parameter must be a chunk of dynamic code -- for example,
DECLARE @DemoParameter varchar(1000)
SET @DemoParameter = 'SELECT MyCol from MyTable where CreatedOn between ''Jan 1, 1980'' and ''Dec 21, 2012'''
EXECUTE sp_msForEachDb @DemoParameter
This would run the query against every database on the SQL instance, returning one set per database -- except for those databases that didn't have the necessary table(s), which would throw an error (particularly the system databases). This leads us to...
-- Round 2 ---------
Within the dynamic code, as databases are iterated over all instances of the question mark ? will be replaced with the name of the currently being processed database. You can use this to filter which databases are to be processed, and which aren't. Note also that the "current" database will not be changed by the routine, you have to do that yourself. This gives us code like:
SET @DemoParameter = '
IF ''?'' like ''%Foo%''
BEGIN
USE ?
SELECT MyCol from MyTable where CreatedOn between ''Jan 1, 1980'' and ''Dec 21, 2012''
'
This would run the query only against those databases whose names contain the characters "foo". Possibly you could check for the presence of the table within each database; other methods suggest themselves.
This will shotgun back one dataset for each database, which doesn't help too much if you need them all in one neat and orderly data set, and that gets us to...
-- Round 3 ------------
Briefly: create a temp table, and populate it from within the dynamic query. As I show below, you can include the name of the database, and ever the server name -- very useful when your questing for lost data across dozens of databases spread across a handful of servers.
Create (or clear) the temp table:
IF object_id('tempdb.dbo.##Foo') is null
CREATE TABLE ##Foo
(
ServerName varchar(100) not null
,DBName varchar(100) not null
-- Add your own columns here
,MyCol int not null
)
ELSE
--Option: Delete this line to not clear on each run
TRUNCATE TABLE ##Foo
Run the code (this is my main template, you can easily work @DemoParameter back in there):
EXECUTE sp_msForEachDB '
IF ''?'' like ''%Foo%''
BEGIN
USE ?
INSERT ##Foo
select @@servername, db_name()
,MyCol
from MyTable
END
'
...and that should produce a single temp table with your data. Test this out, I wrote this without actually testing the code, and typso will silp in. (#temp tables should work as well as ##temp, I generally do this with ad-hoc system support issues)
Here's something that's going to do it!
Declare
@Database varchar(8000),
@Sql varchar(8000)
BEGIN
Declare DBName Cursor LOCAL FAST_FORWARD
For Select name
FROM sys.databases
where name like 'Your_DB_Names%'
Open DBName WHILE (1=1) Begin Fetch Next From DBName into @Database
if @@Fetch_status = -1 Break
if @@Fetch_status = -2 Continue
Set @Sql = 'use '+@Database Print @Sql Execute (@Sql)
SELECT * FROM TABLE -- Your query here
End
Close DBName
Deallocate DBName
END
I've done this often, and let me tell you, keeping separate databases is a PAIN IN THE ASS. It forces you to do all sorts of logic like this all over the place - it sort of breaks the encapsulation that is a database in the first place.
What you are looking at is a data warehouse. You should look into consolidating all of your databases into one, and making it read-only. Then you take nightly/hourly incremental backups of your live data, and restore that against your warehouse. Then your warehouse is always up to date, and you run your reporting against that, instead of the live data.
This has the upshot of keeping your reports from killing your live production databases, and I'd guess upwards of 90% of business needs don't require 100% accurate just-in-time numbers.
Do the hard stuff once - create a warehouse. :-)
EDIT
Something I've done in the past is to create a view of the tables I use, and using linked databases (if the dbs were on other machines)
Create view view_tale as
select * from activedb.dbo.table
union
select * from db1.dbo.table
union
select * from db2.dbo.table
Hideous, performance-wise, but solves the problem neatly. Then you still only have the one-time setup issue (creating a view per table you wsish to query), and a centralized place to modify to keep your database list up to date for ongoing maintenance, as opposed to keeping N number of reports up to date.
Danielle's answer worked for me, with a slight change below. We have dozens of development databases on our servers for all our clients, in groups named with regular prefixes and suffixes. Using a cursor makes it possible to get a look at all the records in all the databases of a certain group. I had to make the change, though, because "execute" wasn't working for the "use" command, so I just made the entire command with the database name.
Declare @Database varchar(8000), @Sql varchar(8000) BEGIN Declare DBName Cursor LOCAL FAST_FORWARD For Select name FROM sys.databases where name like 'MyPrefix%MySuffix' Open DBName WHILE (1=1) Begin Fetch Next From DBName into @Database if @@Fetch_status = -1 Break if @@Fetch_status = -2 Continue set @Sql = 'select * from '+@Database+'MyTable' print @sql execute (@sql) End Close DBName Deallocate DBName END
Depending on the size of the databases, it might actually be better to consolidate them into one database and properly index them.
You can write your own SSIS package and schedule it to consolidate data periodically (daily / hourly / etc).
精彩评论