SQL Design Pattern: how do I store multiple unique ids from different sites in mashup?

2022-12-19 05:54 问答作者：

I'm building a mash-up to store meta-data on items from multiple REST API datasources. I'd like to be able to generate typical feeds (most recent, top rated, most viewed, etc) based on data summarized across all the different datasources, and also add tags (i.e. many-to-many relationships).

My problem is that each datasource has a different way to issue unique ids through their REST API. I need suggestions on the best pattern to use for my MySQL datamodel.

My current solution is to use 1 table for all items and a composite key but the joins are long and cakePHP doesn't deal with composite keys natively:

datasource_id smallint,  
datasource_item_id VARCHAR(36), // some datasources issue alpha keys

Q: Is it ok/better to add an auto increment primary key to my table and translate all my internal joins/indexes from external UIDs to my internal UIDs? :

id int(10) unsigned NOT NULL auto_increment,

Q: Are enums an efficient datatype for storing datasource_id (should have maybe 10 different datasources开发者_高级运维)?

Q: Are there other approaches that yield better, more scalable results in the long run?

Mostly I can only confirm the solutions you've already considered.

Since the storage type used in the table schema doesn't have to be the same as the type of the data (which is why SQLite 2 was untyped and SQLite 3 has so few types), my first impulse is the same as your current solution.

Following another school of thought, namely that IDs which are arbitrary (i.e. those not based on attributes of whatever you're modeling) should be kept internal to your own database, suggests the second solution you mention: add an id column. One reason for this school is that you don't want your tables to be dependent on someone else's internals, though that's less of a concern here. Since cakePHP doesn't support composite keys, this seems the most viable option.

Another solution would be to have the primary key column be a concatenation of the data from the other composite key columns. That is, add an additional column, as with the auto-incrementing ID, but one that stores a non-arbitrary value. This falls under the category of denormalizing and has all the caveats and warnings that implies.

If SQL were a second order logic, you could easily give each datasource its own table. Since SQL is first order, this isn't a very scalable solution.

The first three all share a downside. Each datasource has its own ID type; when storing IDs from different sources in the same column, you need to define additional constraints to enforce type integrity at the database level, probably in the form of triggers (since MySQL doesn't support the CHECK clause).

Q: Are enums an efficient datatype for storing datasource_id (should have maybe 10 different datasources)?

The storage requirements for an ENUM are 1 or 2 bytes, depending on how many distinct values there are. At ten datasources, only a single byte should be used per row. That still wastes a little over 4 bits/row. Whether it's efficient I'll leave up to you.

继续阅读：cakephp

SQL Design Pattern: how do I store multiple unique ids from different sites in mashup?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？