SQL Server 2005 database design - many-to-many relationships with hierarchy
Note
I have completely re-written my original post to better explain the issue I am trying to understand. I have tried to generalise the problem as much as possible.
Also, my thanks to the original people who responded. Hopefully this post makes things a little clearer.
Context
In short, I am struggling to understand the best way to design a small scale database to handle (what I perceive to be) multiple many-to-many relationships.
Imagine the following scenario for a company organisational structure:
Textile Division 开发者_运维问答 Marketing Division
| |
---------------------- ----------------------
| | | |
HR Dept Finance Dept HR Dept Finance Dept
| | | |
---------- ---------- ---------- ---------
| | | | | | | |
Payroll Hiring Audit Tax Payroll Hiring Audit Accounts
| | | | | | | |
Emps Emps Emps Emps Emps Emps Emps Emps
NB: Emps
denotes a list of employess that work in that area
When I first started with this issue I made four separate tables:
Divisions
-> Textile, Marketing (PK = DivisionID)Departments
-> HR, Finance (PK = DeptID)Functions
-> Payroll, Hiring, Audit, Tax, Accounts (PK = FunctionID)Employees
-> List of all Employees (PK = EmployeeID)
The problem as I see it is that there are multiple many-to-many relationships i.e. many departments have many divisions and many functions have many departments.
Question
Giving the database structure above, suppose I wanted to do the following:
- Get all employees who work in the Payroll function of the Marketing Division
To do this I need to be able to differentiate between the two Payroll departments but I am not sure how this can be done?
I understand that I could build a 'Link / Junction' table between Departments and Functions so that I can retrieve which Functions are in which Departments. However, I would still need to differentiate the Division they belong to.
Research Effort
As you can see I am an abecedarian when it comes to database deisgn. I have spent the last two days resaerching this issue, traversing nested set models, adjacency models, reading that this issue is known not to be NP complete etc. I am sure there is a simple solution?
Based on the updated post, and making some (fairly obvious) assumptions based on the names used, I come up with the following. There are four entities:
- Divisions
- Departments
- Functions
- Entities
There are many relationships between these entities. Few of them are hierarchical, most are simple associations:
- Option A1: There is a master list of functions. Every department can perform (or do) one or more function, and a function might be performed by more than on department.
Option A2: Functions are “owned” by departments. No function can be performed by two or more departments. (This appears to be the case, as the HR Dept has Payroll and Hiring, and the Finance Dept has Audit, Tax, and Accounts.)
Functions are performed by departments for (on behalf of) divisions. (HR Dept does Payroll and Hiring for both Textile and Marketing divisions; Finance Dept does Audit and Tax--but not Accounts--for Textile division, and Audit and Accounts--but not Tax--for Marketing division.) Perhaps a bit more precisely, departments perform selected functions for selected divisions that they are associated with, and that association is defined by their performance of that function.
Beyond performing the work of functions, there appears to be no relationship between departments and divisions. There is no hierarchical relationship between them, as one does not “own” or contain the other.
This leads to these roughly sketched out tables:
-- Division -----
DivisionId (primary key)
-- Department ---
DepartmentId (primary key)
-- Function ----- (assumes option A2)
FunctionId (primary key)
DepartmentId (foreign key, references Department)
-- DivisionFunctions ----
DivisionId (First column of compound primary key)
FunctionId (Second column of compound primary key)
(You could optionally include a surrogate key to uniquely identify each row, but DivisionId + FunctionId would work.)
There isn’t enough material here to fully describe how "employees" fit into the model. Given that employees do the work of functions: can an employee do the work of more than one function, or do they only do the one? Does an employee do the work of the function regardless of the division(s) it is being done for, or are they assigned to do the work for one or more divisions? Two obvious options here, though more complex variants are possible:
- Option B1: Employees do the work of one or more functions within departments, and perform that work for all divisions that require that function of that department.
- Option B2: Employees are assigned to perform a specific function for a specific division.
Given these, tables might look like:
-- Employee ----- (assumes option B1)
EmployeeId (primary key)
DepartmentId (foreign key, references Department)
-- EmployeeFunction ----- (assumes option B1)
EmployeeId (First column of compound primary key)
FunctionId (Second column of compound primary key)
... and thus all employees that can perform a function will perform it for all divisions requiring it. Or,
-- Employee ----- (assumes option B2)
EmployeeId (primary key)
DepartmentId (foreign key, references Department)
-- EmployeeAssignment ----- (assumes option B2)
EmployeeId (foreign key, references Employee)
DivisionId (first of two-column foreign key referencing DivisionFunctions)
FunctionId (second of two-column foreign key referencing DivisionFunctions)
(Or, instead of DivisionId and FunctionId, include the optional surrogate key from DivisionFunctions.) ... and thus employees are assigned individually to functions to be performed by the department for a division.
But that still leaves a lot of “what if/when” questions: Do employees “belong to” departments? Can employees belong to (work for) multiple departments? Perhaps employees belong to divisions? Do you track what functions an employee can do, even if they are not currently doing it? Similarly, do you track what department an employee works for, even if they are currently “between functions”? If an employee can perform functions A and B, and a division requires both these functions, might an employee be assigned to only perform A and not B for that division?
There’s a more requirements research to be done here, but I’d like to think this is a good start.
Well you wouldn't put it all into one table. You need to read up on normalizing data and joins. (And never store anything in a comma delimted list.)
No database worth it's salt would have the slightest problem handling a million records, that is a tiny database.
You need tables for functions, courses, locations, people, organization and possibly some joining tables to accommodate many to many relationships. But none of this is hard or even beyond very basic design. I recommend that before you do anything, you get a book on your chosen database and read up on the basics.
As you are "abecedarian" :), one thing to do before any attempt to feel at home with database design is read about normalization, and to completely understand all normal forms up to 5NF
If you want to model that
1. departments are in divisions
2. functions are performed in departments
3. employees perform functions
and that not all functions are performed in all of the departments, nor all the departments are in all divisions then you have to store that fact somewhere.
While doing logical design, give your tables descriptive names, so some departments are in divisions
departments_in_divisions
candidate key: department, division
then you have some functions in some departments
functions_departments_divisions
candidate key: function, department, division
references: (department, division) in departments_divisions
then employees have some functions from some departments and divisions
employees_function_department_division
candidate key: employee, function, department, division
references: (function, department, division) in functions_departments_divisions
After (or before this) you have 3 more entities functions, departments and divisions which would list all the possible departments, divisions and functions that would also be referenced by the above tables (this might not be completely normalized).
Also the names of the entities (tables) can become something more appropriate to you (only you can know the full semantics of the model of your data). Especially if you notice that you need to assign other attributes (fields) to them.
The values for departments, divisions and functions are their names, there are no artificial ids yet in the above analysis. You can introduce them in the next step, after the logical modelling comes physical modelling, or you can keep the natural keys. If you go with artificial keys that can cut down the usage of composite keys to max 2, but it does obfuscate the relationships and the meaning of the facts that you are storing in your tables. (Example functionID can be and ID of a function name or an id of a function that is performed in certain division/department combination - it is not clear what it is and these are not interchangeable; sort of like the difference between an instance and a class).
You need a simple star relationship. The Position (fact table) has just ID's of related master tables (Department, Division etc). This allows for any combination of the master tables to be used
The master tables can have simple hierarchy built into each of them as needed. And can relate to each other as needed. But the detail of this does not effect the queries against Position
You can make ID's in Position nullable for optional relationships
You could add a StartDate and EndDate columns to Position to track changes over time
A simple example of this is:
Try giving each entity a table of its own e.g
//Table Structure
location
locationId
name
division
divisionId
name
locationId (fk => location)
department
deparmentId
name
divisionId (fk => division)
function
functionId
name
departmentId(fk => department)
jobrole
jobroleId
name
functionId
course
courseID
name
jobrole_course_requirement
jobroleID
courseID
employee
employeeID
name
employee_jobRole
employeeID
jobRoleId
emploeyee_course_attendance
emploeyee_course_attendanceID
emploeyeeID
courseID
dateAttended
And the some sample selects
// Get course requirements for an employee
select course.name
from course,
jobrole_course_requirement,
employee_jobRole
where
employee_jobRole.employeeID = 123 and
jobrole_course_requirement.JobRoleId = employee_jobRole.JobRoleId
course.courseID = jobrole_course_requirement.courseID
Usually when I am setting up a db, I come up with what entities I need and how they are related to each other (ie many-one, one-one,...). Which you seem to have done. So next I figure out what each entity will need. For example, Location may have: locationid, address, ... Then, Divisions Assuming each that there are one location for many divisions, you could have the division entity have a divisionid, locationid, the information each division needs. So basically, if its a one-many relationship like one location to many divisions, you could just put the id of location in the division table. However, if it is a many-many relation, it is probably better to have an intermediary table to connect the two so you do not need to have duplicate records with only an id changing.
Perhaps (probably) you should consider the HR department of the Textile division as a different department than the HR department of the Marketing division.
精彩评论