开发者

Example of a database table design with a questionable repeating group

I am having a discussion with someone about the following table that is used to link client-specific items:

Table LINK:

Client (int) 
Item1 (int) 
Item2 (int)

This is the disputed design. All three fields refer to other tables. The two Item fields refer to the same other table. These are not real field names, so don't bother with discussing naming conventions (the "1" and "2" are, however, really part of the field name). I am arguing that this design is bad on 1NF-violation grounds, while the other person is arguing that even though this seems distasteful, all other options are worse for our specific use-case.

Notes:

  • The vast majority of cases will only require linking two items with each other;
  • N:1 groups are however allowed; in such a case, the same Item1 is repeated on multiple lines with different Item2 values;
  • There are also a very small number of cases where some Item2 values (in an existing Item1-Item2 links) are themselves linked to other Items, and in these cases these values occur in the Item1 column, with the other linked value in the Item2 column; all linked items correspond to one group, and must be retrieved as such.

My claims:

  • This violates 1NF: Item1 and Item2 are foreign keys for the same table, and as such constitute a repeating group (the other party disagrees on the defn of repeating group);
  • For searches on Item, this means that two indexes are required instead of one, for example in a table that uses a GroupID field instead;
  • This makes queries looking for a specific Item in this table more complex, because a restriction clause must examine both Item1 and Item2 fields.
  • The retrieval for the case where chains of Item links occur will be more complex.

The other side claims:

  • The most viable alternative is a table with a single Item field, and an additional GroupID field;
  • The simpler, more common two-item link case now becomes more complex;
  • There could be concurrency issues in obtaining GroupID slots, and that needs to be managed
  • Managing the GroupID concurrency issues probably requires a second table with GroupIDs in a field with a uniqueness constraint
  • You now have to perform a join, at least some of the time, especially if an ORM is used. The join is less efficient than using a single table as in current design.

I would like to hear some opinions about this. I have read the other posts on SO about database design, and especially 1NF, but they don't deal as specifically with my case above as I would have liked. I have also come to understand, based on much research online, that so-called standards like 1NF can be defined in many different ways by different people. I have tried to be as clear as possible about both arguments, and not to bias one or the other.

EDIT 1:


What are Item1 and Item2? Are they distinct entities? Then the design seems fine to me.

For example, you might want to fill a database with solutions to the traveling salesman problem. You have a table City(cityId, latitude, longitude), and a table Path(pathId, salesmanId). Now a path where the salesman visits n+1 cities would be represented by n entries in PathSegment(pathId, segmentId, fromCityId, toCityId). Here, although fromCityId and toCityId are foreign keys that reference the same table City, they describe different attributes of the PathSegment entity, hence this does not violate NF1.

Edit:

So you want to store trees, actually, only your trees are mostly just linked lists, and most of those are linked lists with just two nodes, right? And apparently your coworker wants to do this as an adjacency list, so a tree like

1-2-3
\-4

becomes

(1,2)
(2,3)
(1,4)

There's nothing wrong with that, but it's not the only way to store a tree in a database. For a good summary of alternatives, see here.

In your case, the advantage of using an adjacency list is that most of your trees have only two nodes, so most of them end up being one row in the table, keeping that simple. Also, questions about the immediate neighbours are easy. "What's the invoice for this payment?" becomes

select item1 from link where item2 = :paymentID

which is neat, too. There are drawbacks, though. The order of child nodes often matters, and the list doesn't help you here, so you have to store that either as a separate column or as something like timestamps in the tables your foreign keys are referring to). Also, reconstructing an entire branch becomes a recursive task, and not all database systems can do that. So if your application often has to retrieve a message-board-like overview of the invoice history, it might require some application-side logic that turns the list of adjacent nodes into a tree on the client and works on that. If that becomes too cumbersome, you might want to consider a nested sets representation, see here.

What's best for your problem? Depends on several things: size and shape of your trees (if they are really mostly short linked lists, adjacency list is good), frequency of inserts and updates (if frequent, adjacency list is good, because its inserts are cheap), frequency and complexity of queries (if frequent and complex, nested sets is good, because its selects are simple and fast). So for a message board, i'd go with nested sets (or even Tropashkos nested intervals for speed and extra coolness), but for a simple request-response (and sometimes some more response) table, i'd probably use an adjacency list.


Just having two foreign keys pointing to the same table isn't by default a "violation." You might have a Person table with FatherID and MotherID fields both pointing back to the Person table. This isn't a repeating group, as they're semantically different attributes. Your first claim—as written and free of any other context—is false.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜