Separating code logic from the actual data structures. Best practices? [closed]
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this questionI have an application that loads lots of data into memory (this is because it needs to perform some mathematical simulation on big data sets). This data comes from several database tables, that all refer to each other.
The consistency rules on the data are rather complex, and looking up all the relevant data requires quite some hashes and other additional data structures on the data.
Problem is that this data may also be changed interactively by the user in a dialog. When the user presses the OK button, I want to perform all the checks to see that he didn't introduce inconsistencies in the data. In practice all the data needs to be checked at once, so I cannot update my data set incrementally and perform the checks one by one.
However, all the checking code work on the actual data set loaded in memory, and use the hashing and other data structures. This means I have to do the following:
- Take the user's changes from the dialog
- Apply them to the big data set
- Perform the checks on the big data set
- Undo all the changes if the checks fail
I don't like this solution since other threads are also continuously using the data set, and I don't want to halt them while performing the checks. Also, the undo means that the old situation needs to be put aside, which is also not possible.
An alternative is to separate the checking code from the data set (and let it work on explicitly given data, e.g. coming from the dialog) but this means that the checking code cannot use hashing and other additional data structures, because they only work on the big data set, making the checks much slower.
What is a good practice to check user's changes on complex data before applying them to the 'ap开发者_如何学JAVAplication's' data set?
This is probably not much help now, since your app is built, and you probably don't want to reimplement, but I'll mention it for reference.
Using a ORM framework would help you here. Not only does it handle getting the data from the database into an object oriented representation, it also provides the tools to implement isolated temporary changes and views:
Using the ORM framework with transactions, you can allow the user to change the objects in the model without affecting other users, and without commiting the data "for real" until it has been checked. The ACID guarantees of transactions ensures that your changes are not persisted to the database, but held in your transaction, only visible to you. You can then run checks on the data and commit the transaction only if the data validates. If the data doesn't validate, you rollback the transaction and discard the changes. If it does validate, you commit the transaction and changes are made permanent.
Alternatively, you can create views which provide your data for validation. The views combine the base data and temporary tables (local to your current connection). This avoids locking tables, at the expense of having to write and maintain the views.
EDIT: If you already have a rich object model in memory, the hardest part to making that support incremental, local and isolated changes is direct references between objects. When you want to replace object A with A', that contains a change, you don't want to do a deep copy, with all referneces, since you mention that your object model is large. Also, you don't want to have to update all objects that were pointing to A to reference A'. As an example, consider a very large doubly linked list. It's not possible to create a new list that is the same as the old one with just one element changed, without duplicating the entire list. You can achieve isolation by storing the identifier for related objects rather than the object themselves. E.g. Instead of referencing A explicitly, your collaborators store a reference to the unique key that identifies A, key(A). This key is used to fetch the actual object at the time it is needed (e.g. during verification.) Your model then becomes a large Map of keys to objects, which can be decorated for local changes. When looking up an object by key, first check the local map for value, and if not found, check the universal map. To change A to A', you add an entry to the local map, that maps key(A) to A'. (Note that A and A' have the same key, since logically they are the same item.) When you run your veriification code, local changes are then incorporated, since objects referring to key(A) will get A', while other users using key(A) will get the original, A.
This may sound complex written down, but by removing explicit references and computing them on demand is the only way of supporting isolated updates without having to do a deep copy of the data.
An alternative, but equivalent way, is that your validator uses a map to lookup objects with their replacements before it uses them. E.g. your user modifies A, so you put A->A' into the map. The validator is iterating over the model and comes across A. Before using A, it checks the map, and finds A', which it then uses. The difficulty of this approach is that you have to make sure you check the map every time before an object is used. If you miss one, then your view on the model will be inconsistent.
I would try by any means to verify changes before applying them to the data set, as undoing the ripple effects of changes which later turn out to be invalid can easily become a nightmare.
If there is really a lot of data, I understand that creating a full copy of it may not be feasible - although in general "copy on write" would be the simplest and safest solution. If you really are only able to verify the changes by taking into account the whole set of data, you could try a "decorator"-like approach, i.e. somehow creating a "view" of the changes layered on top of the existing body of data, without actually modifying the latter. This could be used to validate the changes, and if the validation succeeds, you can actually apply the changes; otherwise you can simply throw away the "view" and the changes, without affecting the original data in any way.
Hmm, I would suggest rather than loading data copying it in memory. This is expensive, but will allow you to work on all data concurrently. When changes on data are valid, just apply the changes from the copy to all data using some locking strategy. This way you do not need any undo as long as you can apply the changes atomically. You could even try some transaction system if your needs are more complex. Also think about lazy-loading(copying) your data as you really need them. Finally what comes to my mind is that if you need to workon large data sets from databases using transactions, try considering using Prolog. It might be reasonable to formulate your chcecks as predicates.
Sounds as if you should instead move the rules etc to the database where they belong, by having the checks in our app you will always issues. Instead by placing as much of the logic in for instance stored procedures that are run when the user insert the values you could catch and rollback invalid input. But I guess you have your reasons for keeping it all in memory.
精彩评论