开发者

Except joining 20 tables any other options for writting / reading data?

I am creating a user profile. All the fields are lookup based from their own tables like favourite movies, favourite music, favourite food, sports, etc... These are not text fields but auto suggest fields from a system maintained list. The reason these are all in their own table is they have more colunms unique to them.

So to read the data at user input it is ok but after that two problems:

1) Writting data: Since this is a M:M relationship i will need 20 different tables?

2) Reading data at profile load time: I need to join all these 20 tables to get the user's data?

What other option do i have to store all these user details? my only concern is performance since this is a social site. 20 joins is not good. But i am not sure about other techniques. I am using开发者_如何学编程 mysql and php.

The only other option i can think of is to store the data in an array in the db although i dont know how well searches will work with that.


  1. Install your favorite dbms.
  2. Create a user table, and two or three user favorites tables.
  3. Write a little program to generate and load a million random users.
  4. Write a little program to generate and load 10 million favorite movies (or whatever) for those million users.
  5. Run some queries.

If speed is a problem, post the schema with the "database-design" and "query-optimization" tags, and include a link to this question.


Later . . . Bored. So I did the test myself. I don't have time to do 20 joins, but 5 left joins, on a table of a million users and 50+ million rows in each of the joined tables returns in about 400 milliseconds. (PostgreSQL 9.0.2) Back to work now . . .


And still later . . . Still bored. I added more tables, more data, more left outer joins. Depending on the data for a particular email address, more joins can be still faster. (Would you have guessed that?) The last test I ran, selecting a few dozen random email addresses from "users",

           run time (milliseconds)
--
median      40 
maximum    222
minimum      0.4 ("Four tenths of a millisecond", not a typo.)

If I'm still bored later, I'll bang together a program to

  • execute a few hundred queries based on random email addresses, and
  • record the execution time (although I'm not sure that's possible)


20 joins is not good

Who says? I wouldn't worry about the number of joins in your queries unless you see that it actually becomes a problem. Relational databases are designed to work around, well, relating tables to one another.

Now, as to whether or not you actually need 20 joins is another question entirely. You should post a sample query and some relevant portion of your database design and we can tell you if anything looks amiss.


You could store user profiles in a non-relational data store like MongoDB.

Not only will this avoid the mess of joins, but it makes it easier to change your user profile schema on the fly, and avoid writing a lot of data access code.


Depending on your performance requirements, 20 joins might or might not be a problem. But if you want sub-second response under load, then it could indeed be good to avoid this. But if this only occurs when a user logs in, and you expect no more than a few logins per second, and you don't have other heavy db load, etc., then the performance might be quite tolerable.

I'd be surprised if you couldn't combine some of those. I'd think that many of the profile attributes could be represented in a common structure like PersonId, TraitType, string1, string2, int1, int2, date1, date2.

If, in your code, you are doing anything like an OO representation of your data then you could map the traittypes to classes that fully represent the type, so your program could would not need to work at level of abstraction that this table design represents.

  • Elroy


Is it necessary to load ALL the data at profile load? It seems to me that your reference table NAMES represent some sort of affordances on the Profile itself which, when activated by the user, execute the query for items from that table, no?

I am not the expert web programmer, so I could have this all wrong. But it seems like the profile loads and presents the user with what amounts to a summary/navigation interface, right? With some buttons/glyphs what have you by which the user navigates and/or requests more information.

On profile load, the top-level profile info appears, along with (much over-simplified) some buttons, possibly under a header widget, "My Favorites". Buttons might exist for "PLaces", "Food/Drink", "Music", Etc. When the user activates one of THESE, a query executes against that particular table (and any relevant joins) to return the data specific to "Places" for example.

Maybe it is different with Web (and I will be learning soon enough), but I try to request data when it makes sense, and at a pont when the user might expect a brief access time. A button click is usually a point at which the user will expect a slight lag in response.


One way to cut down on the number of joins is to store the data that is common to all 20 types in a single table. The relationship of this table to the 20 specialized tables follows the gen-spec design pattern. Lookup "generalization specialization relational modeling" to see how to implement the gen-spec pattern in tables.

If you do this, you will only reference the specialized tables when you need them.

I'm not sure of your usage patterns in your case, so I can't say if this advice applies to your situation. But it's worth looking into.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜