SQL Database Design - Cache Tables?
What's a common/best practice for database design when it comes to improving performance on count(1)
queries? (I'm currently using SQLite)
I've normalized my data, it exists on multiple tables, and for simple things I want to do on a single table with a good index -- queries are acceptably quick for my purposes.
eg:
SELECT count(1) from actions where type='3' and area='5' and employee='2533';
But when I start getting into multiple tabl开发者_如何学Pythone queries, things get too slow (> 1 second).
SELECT count(1)
from
(SELECT SID from actions
where type='3' and employee='2533'
INTERSECT
SELECT SID from transactions where currency='USD') x;
How should I cache my results? What is a good design? My natural reaction is to add a table solely for storing rows of cached results per employee?
Edit
Design patterns like Command Query Responsibility Segregation
(CQRS) specifically aim to improve the read side
performance of data access, often in distributed systems and at enterprise scale.
- Commands are issued to indicate 'transactions' or 'change / updates' to data
- When a system processes these commands (e.g. by updating database tables), the new state of the affected objects is 'broadcast'
- Systems which are interested (such as a user interface or a queryable REST API) will then subscribe to these data changes, and then 'shape' the updated data to their specific needs
- This updated data is then cached (often called a 'Read Store')
Another pattern commonly associated with CQRS is "Event Sourcing", which stores, and then allows 'replay' of Commands, for various use cases.
The above may be overkill for your scenario, but a very simple implementation of caching at an internal app level, could be via a Sqllite Trigger
Assuming that there are many more 'reads' than writes to your actions
or transactions
tables,
- You could create a cache tables specifically for "SID for actions by type by employee' and one for "SID for transactions by Currency", or even combine the two (depends on what other scenarios you have for querying)
- You would then need to update these cache table(s) every time the underlying
action
ortransactions
tables update. One cheap (and nasty) way would be to provide an INSERT, UPDATE and DELETE trigger on theaction
andtransactions
table, which would then update the appropriate cache table(s). - Your 'query' interface would now primarily interact with the cache tables, using the 'derived' data (such as the counts).
- You may still however need to handle cache miss scenarios, such as the initial 'seed' of these cache tables, or if the cache tables need to be regenerated.
In addition to a local relational database like SqlLite
, NoSql databases like MongoDb, Cassandra and Redis
are frequently used as alternatives to read side caching in read-heavy environments (depending on the type and format of data that you need to cache). You would however need to handle alternative to synchronize data from your 'master' (e.g. SQLLite) database to these cache read stores - triggers obviously won't cut it here.
Original Answer
If you are 100% sure that you are always repeating exactly the same query for the same customer, sure, persist the result.
However, in most other instances, RDBMS usually handles caching just fine.
The INTERSECT with the query
SELECT SID from transactions where currency='USD'
Could be problematic if there are a large number of transaction records with USD.
Possibly you could replace this with a join?
SELECT count(1) from
(
SELECT t.[SID]
from
transactions as t
inner join
(
SELECT SID from actions where type='3' and employee='2533'
) as a
on t.SID = a.SID
where t.currency= 'USD'
) as a
You might just check your indexes however:
For
- SELECT count(1) from actions where type='3' and area='5' and employee='2533'
- SELECT SID from actions where type='3' and employee='2533'
An index on Actions(Employee, Type)
or Actions(Employee, Type, Area)
would make sense (assuming Employee has highest selectivity, and depending on the selectivity of Type and Area).
You can also compare this to an index on Actions(Employee, Type, Area, SID) as a covering index for your second query.
And for the join above, you need an index on Transactions(SID, Currency)
精彩评论