Choosing a primary key,surrogate key, index in the mysql stock trading database
Database type: mysql
Columns: Date,time,price1,qty1,price2,qty2 time will be in milliseconds number of records approx 5.5 million for a month.
I cant choose date as primary key as it is not unique, but can choose date and time as combined but that is also not a good idea.
i will be running queries like select price and qty between 'this date and time' and 'that date and time' and result might be in millions range.
what could be the best choice in terms of primary key, index an开发者_开发技巧d surrogate key and what is the best way to implement this. how should i optimize the database.
Not sure why you say choosing both date and time would be a bad idea (are you against composite keys?)
A bigger problem for you is that time does not store milliseconds. See this bug for more data on that: http://bugs.mysql.com/bug.php?id=8523
Also, there seems to be something missing from the key that identifies the Stock such as Ticker. Since the ticker can change over time, it might be a good idea to introduce a surrogate for it such as StockID. You would do this in a table called Stock or similar.
Then for your Trade table, I would suggest using StockID, Date and Time (but store the time in something other than the TIME datatype so you can store milliseconds. Ask another question if you need help with that).
The order of the keys in the PK is important for both storage and retreival. For retrieval, you want to put the most selective keys for your query first. So if you tend to access all the data for a stock at once (or for a set of stocks), put StockID first so the index can be used to find them quickly. If you tend to access all data for a given interval, put Date then Time first.
For storage, its better to be appending so having Date and Time first is a good idea here too.
In case you want to access mostly in date ranges, but sometimes by Stock, put a secondary index on StockID.
As you don't have a natural key (so nothing unique within each row), you'd need to add a surrogate key (for the sake of argument "transactionid"). You can still have your index based on date time (that really, really should be a single column) for efficient period scanning.
精彩评论