Multi-tiered / Hierarchical SQL : How does Reddit do it? Which is the most efficient way? And what databases make it simpler?
I've been reading up a bit on how multi-tiered commenting systems are built:
http://ar开发者_C百科ticles.sitepoint.com/article/hierarchical-data-database/2
I understand the two methods talked about in that article. In fact I went down the recursive path myself, and I can see how the "Modified Preorder Tree Traversal" method is very useful as well, but I have a few questions:
On a side note, how is SQL pronounced? I'm getting the feeling I've been wrong for the past several years by saying 'sequel' instead of 's - q - l', although "My Sequel" rolls easier off the tongue than "My S Q L"!
MPTT is easier to fetch (a single SQL query), but more expensive to update. Simply delegate the update to a background process (that's what queue managers are for). Also note that most of that update is a single SQL UPDATE command. It might take long to process, but a smart RDBM could make the transaction visible (in cache) to new (read-only) queries before it's committed to disk.
I'd bet it uses MPTT, but not only doing the 'hard' update in background but also quite likely do a simple rendering to in-memory cache. This way, the posting user can see his post immediately, without having to wait until updating so many rows. Also, SSDs do help in getting high transaction rates.
that's called Adjacency Model (or sometimes adjacency list), it's a more obvious way to do it, and simpler to update (doesn't modify existing records) but FAR more inefficient to read. You have to do a recursive walk of the tree, with an SQL query at each node. That's what kills you: the number of small queries.
PostgreSQL has recursive SELECTs, which do in the server what you envision in PHP. It's better than PHP because it's closer to the data; but it still has the same (huge) number of random-access disk seeks.
You should have a closer look at the links in Further reading they give in the end. The Four ways to work with hierarchical data article on evolt linked there provides another way to approach this problem (the Flat table). Since that approach is extremely easy to implement for a threaded discussion board, I wouldn't be surprised if reddit uses it (or a variation on the theme).
I do like MPTT (aka nested set) though, and have used it for hierarchies that are (almost) static.
精彩评论