Project data from flat table to Adjacency List
I am designing a system to record and report on daily measurement data. The data consists of a category identifier, the date/time, and the measurement data (of which there can be up to 500 pieces either as float or int).
The categories are visualized as a tree structure where data is associated with a node as well as with a leaf. The raw data comes in as a CSV in the following format:1/6/2001 15:55, /Node1/Node2/Node3, 121, 34, 452, 651, 167
1/6/2001 15:55, /Node1/Node2/Node3/LeafA, 12, 34, 45, 65, 67
1/6/2001 15:55, /Node1/Node4/Node5/LeafB, 21, 32, 43, 54, 65
I am planning on using Adjacency List (see Database Structure for Tree Data Structure) for the tree structure. I am also planning to have a second table just for the measurement data and the date/time. This way, once the tree structure is generated the first time, it can be referenced over and over again by the measurement data table. Also, having a small Adjacency List table makes the system much more readable :). In the Category table below, Name would be a node or leaf name (e.g. Node1 or LeafA) and FullName would be the entire branch path (e.g. Node1/Node2/Node3/LeafA). Not sure I need both, but I think they will come in handy so I don’t have to recreate the FullName when needed.
CREATE TABLE [dbo].[Category](
[CatId] [int] IDENTITY(1,1) NOT NULL,
[ParentCatId] [int] NULL,
[Name] [nvarchar](30) NOT NULL,
[FullName] [nvarchar](MAX) NOT NULL
CONSTRAINT [PK_Category] PRIMARY KEY CLUSTERED
(
[CatId] ASC
) ON [PRIMARY]
) ON [PRIMARY]
GO
CREATE TABLE [dbo].[MeasurementData](
[CatId] [int] NOT NULL,
[DateCollected] [datetime] NOT NULL,
[foo] [int] NOT NULL,
[bar] [float] NOT开发者_StackOverflow中文版 NULL,
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[MeasurementData] WITH CHECK ADD CONSTRAINT [FK_ MeasurementData _Category] FOREIGN KEY([CatId])
REFERENCES [dbo].[Category] ([CatId])
GO
ALTER TABLE [dbo].[MeasurementData] CHECK CONSTRAINT [FK_ MeasurementData _Category]
GO
To load the data into the system, I was thinking of using BCP to load the CSV into a flat table (into SQL Server 2008) and then project the flat table to the hierarchical table structure.
Q1: Should I attempt this projection using T-SQL or C# (C# app outside of SQL Server)? Q2: Anyone have an existing algorithm to quickly find (or create and return) the correct leaf given the category identifier above?FYI, I’m also in the process of wrapping my head around the recursive query syntax using the WITH keyword followed by a common table expression - for when i need to do some recursive programming.
https://stackoverflow.com/questions/tagged/common-table-expression http://media.pragprog.com/titles/bksqla/trees.pdfThanks in advance
Your table structure may be a little iffy.
The example input data you provide suggests that the entire set of measures applies to the entire node list. If this is true, then you are better off hashing the node list string, getting something like this:
TABLE: Category
HashId NodeList
====== ===================
289383 node1\node2\....
139829 node6\node7\....
The foreign key from MeasurementData is now on the HashId.
This answers your Q1: generate the hash in C# while passing the data, generating two output files that are BCP ready for the Category table and the MeasurementData tables.
Since this is some kind of data warehouse, don't be afraid to generate other copies of the data that are optimized for retrieval by other methods, so by all means make a second representation of the categories, in a CategoryDetails table something like this:
TABLE CategoryDetails
HashId NodeName ParentNodeName
====== ========= =================
289383 node1
289383 node2 node1
etc, etc,
As for how to use Common Table Expressions, I also had some trouble wrapping my head around them, but once I figure it out I wrote a blog entry: http://database-programmer.blogspot.com/2010/11/recursive-queries-with-common-table.html
精彩评论