Building "isolated" and "automatically updated" caches (java.util.List) in Java
FOR ANYONE INTERESTED: I have implemented the code for the behaviour I am looking for and open-sourced it on google-code. Get it here! pojo-mvcc
--
Hi Guys,
I am trying to write a framework which contains a lot of short-lived caches created from a long-living cache. These short-lived caches need to be able to return their entier contents, which is a clone from the original long-living cache.
Effectively what I am trying to buil开发者_如何学运维d is a level of transaction isolation for the short-lived caches. The user should be able to modify the contents of the short-lived cache, but changes to the long-living cache should not be propogated through (there is also a case where the changes should be pushed through, depending on the Cache type).
I will do my best to try and explain:
master-cache contains: [A,B,C,D,E,F] temporary-cache created with state [A,B,C,D,E,F]
1) temporary-cache adds item G: [A,B,C,D,E,F] 2) temporary-cache removes item B: [A,C,D,E,F]
master-cache contains: [A,B,C,D,E,F]
3) master-cache adds items [X,Y,Z]: [A,B,C,D,E,F,X,Y,Z]
temporary-cache contains: [A,C,D,E,F]
Things get even harder when the values in the items can change and shouldn't always be updated (so I can't even share the underlying object instances, I need to use clones).
I have implemented the simple approach of just creating a new instance of the List using the standard Collection constructor on ArrayList, however when you get out to about 200,000 items the system just runs out of memory. I know the value of 200,000 is excessive to iterate, but I am trying to stress my code a bit.
I had thought that it might be able to somehow "proxy" the list, so the temporary-cache uses the master-cache, and stores all of it's changes (effectively a Memento for the change), however that quickly becomes a nightmare when you want to iterate the temporary-cache, or retrieve an item at a specific index. Also given that I want some modifications to the contents of the list to come through (depending on the type of the temporary-cache, whether it is "auto-update" or not) and I get completly out of my depth.
Any pointers to techniques or data-structures or just general concepts to try and research will be greatly appreciated.
Cheers,
Aidos
Here's what you want to do. Something akin to what's known as MVCC, Multi Version Currency Control.
Simply put you need to associate a transaction ID to your cache elements.
So a cache entry will look something like this:
public class CacheEntry {
long transactionId;
boolean deleted;
Object value;
}
Your cache entries are stored in a list, in reverse transactionid order.
When you go to get a cache element, you look up the list (in your hash map). Then you search for the value that has the highest transaction ID that is LESS THAN or EQUAL to the transaction ID of YOUR transaction.
So, let's take the DELETE issue in to account.
You have Transaction 10, looking for "ABC". We'll assume ABC is already in the cache, and it was put in by Transaction 5.
So, T10 gets the list of entries for ABC, searches down the list and finds that at T5, there's the value "123". T5 is the highest transaction less than or equal to T10. T10 changes the value for ABC from 123 to 456.
Now T12 comes along and it looks for ABC. It will find the value to be 456 from T10. T12 decides to delete ABC, and so a "deleted" cache entry for T12 is placed in the cache entry list. If T10 tries to look up ABC again, it will find 456, because 12 > 10, and the highest transaction <= 10 is T10, so it does not see the deletion.
T14 comes along, searches for ABC, CAN'T find it (because it's "deleted"), and inserts a new value of 789. If T12 were to look it would still be deleted, if T10 were, it's still 456.
So, in the end, you cache list looks like this:
{tid: 14 deleted: false value: 789}
{tid: 12 deleted: true value: nul}
{tid: 10 deleted: false value: 456}
{tid: 5 deleted: false value: 123}
The next problem you have is dealing with visibility across open transactions. i.e. can another transaction see data from another open transaction that's not committed. But that's not too hard, as it just adjusts the criteria when scanning the list of versions for an appropriate candidate. And you can keep a list of transaction IDs, with their status (open, committed, rolledback).
Finally you have to come up with a mechanism to clean up the loose ends. After you commit two transaction, with no other open transactions, the older records can be removed.
For example, if you have the data from T5 and T10, if both of them are committed, no one will ever be able to "see" T5's data again as T10's is now the "current" state. So, the T5 row can be deleted.
This is probably best done by simply iterating over the cache and removing obsolete transaction entries.
That's the gist of it, obviously the devil is in the details.
精彩评论