How do I bring data together from multiple databases?
BACKGROUND:
I should preface this by saying I'm not trying to get someone to do my work for me. I feel like I'm at a bit of a crossroad where there are multiple ways to get to my goal, but I'm not sure which ones are 'standard' and/or if my relatively limited knowledge is lacking.
I've got a system that's been evolving for six months now, and since Jan 11, the DB schema has been pretty stable. (I was never sure if I was making a major mistake creating a database for each month to correspond with monthly accounting cycles, but I just didn't have the know-how to do otherwise)
NOW:
My boss is asking me to create year to date reports consisting of records from all the monthly databases.
WHAT I STARTED DOING:
I put together a meta data schema and populated it with enough information that I could write an application to perform ETL operations.
Here's what it looks like:
USE [DAMain1]
GO
CREATE TABLE AccountingPeriod (
Id INT PRIMARY KEY NOT NULL,
Name VARCHAR(255) NOT NULL UNIQUE,
DateStart DATE NOT NULL,
DateStop DATE NOT NULL
)
GO
INSERT INTO AccountingPeriod VALUES
(1, 'Jan11', '1/1/2011', '1/31/2011')
,(2, 'Feb11', '2/1/2011', '2/28/2011')
,(3, 'Mar11', '3/1/2011', '3/31/2011')
,(4, 'Apr11', '4/1/2011', '4/30/2011')
,(5, 'May11', '5/1/2011', '5/31/2011')
CREATE TABLE [DBServer] (
Id INT PRIMARY KEY NOT NULL,
Name VARCHAR(255) NOT NULL UNIQUE
)
GO
INSERT INTO DBServer VALUES
(1, 'Aaron.directagents.local')
GO
CREATE TABLE [DBInstance] (
Id INT PRIMARY KEY NOT NULL
,DBServerId int NOT NULL REFERENCES DBServer(Id)
,SchemaName VARCHAR(255) NOT NULL
,CatalogName VARCHAR(255) NOT NULL
,ConnectionString VARCHAR(2000) NOT NULL
)
GO
INSERT INTO DBInstance VALUES
(1, 1, 'dbo', 'DADatabaseR2', 'Data Source=aaron\sqlexpress;Initial Catalog=DADatabaseR2;Integrated Security=True')
,(2, 1, 'dbo', 'DADatabaseR3', 'Data Source=aaron\sqlexpress;Initial Catalog=DADatabaseR3;Integrated Security=True')
,(3, 1, 'dbo', 'DADatabaseMarch11', 'Data Source=aaron\sqlexpress;Initial Catalog=DADatabaseMarch11;Integrated Security=True')
,(4, 1, 'dbo', 'DADatabaseApr11', 'Data Source=aaron\sqlexpress;Initial Catalog=DADatabaseApr11;Integrated Security=True')
GO
CREATE TABLE DADB (
Id int PRIMARY KEY NOT NULL,
Name VARCHAR(255) NOT NULL UNIQUE,
AccountingPeriodId int NOT NULL REFERENCES AccountingPeriod(Id),
DBInstanceId INT NOT NULL REFERENCES DbInstance(Id)
)
GO
INSERT INTO DADB VALUES
(1, 'Direct Agents Database for January 2011', 1, 1)
,(2, 'Direct Agents Database for February 2011', 2, 2)
,(3, 'Direct Agents Database for March 2011', 3, 3)
,(4, 'Direct Agents Database for April 2011', 4, 4)
GO
CREA开发者_C百科TE VIEW DADBs AS
SELECT
DA.Name [Database]
,AP.Name [Accounting Period]
,AP.DateStart [Start]
,AP.DateStop [Stop]
,DS.Name [Server]
,DI.SchemaName
,DI.CatalogName
,DI.ConnectionString [Connection]
FROM
DADB DA
INNER JOIN AccountingPeriod AP ON DA.AccountingPeriodId=AP.Id
INNER JOIN DBInstance DI ON DA.DBInstanceId=DI.Id
INNER JOIN DBServer DS ON DI.DBServerId=DS.Id
GO
SELECT * FROM DADBs
GO
PROBLEM:
I don't know if this is a reasonable/normal way of going about it. I have enough time to ramp up on one thing, but I can't figure out on my own what path to go down.
QUESTION: Given that I need to pull line item data and aggregate over multiple databases as I explained, are there alternatives to defining meta data tables that drive custom ETL solutions? (for my purposes a C# app and a SSIS project are eqiv, but I'm interested to know if one might use Analysis Services or Reporting Services here)
Bad database designs often rear themselves in reporting. As you have discovered, having the data for each month in separate databases has created a reporting nightmare. Imagine what would happen if the accounting cycle dates change? The better solution would be to consolidate the data into a single database where you determine the accounting cycle based attributes of the entries (date entered, date posted etc.).
In the interim, given what you have, I'd say the best solution is to create a consolidated database and fill it using SSIS from the other databases until you can update the middle-tiers or UIs to use the consolidated design.
精彩评论