How should I store a sparse decision tree(move list) in a database?
I have been thinking of making an AI for a board game for a long time, and recently I've started to gather resources and algorithms. The game is non-random, and most of the time, there < 3 moves for a player, sometimes, there are >20 moves. I would like to store critical moves, or ambiguous moves so that the AI learns from its mistakes and will not make a same mistake the next time. Moves that surely win or lose need not be stored. So I actually have a sparse decision tree for the beginning of games. I would like to know how I should store this decision tree in a database? The database does not开发者_如何学编程 need to be SQL, and I do not know which database is suitable for this particular problem.
EDIT: Please do not tell me to parse the decision tree into memory, just imagine the game as complicated as chess.
As you will be traversing the tree, neo4j seems like a good solution to me. SQL is no good choice because of the many joins you would need for queries. As i understand the question, you are asking for a way to store some graph in a database, and neo4j is a database explicitely for graphs. For the sparseness, you can attach arrays of primitives or Strings to the edges of your graph to encode sequences of moves, using PropertyContainers (am i right that by sparseness and skipping of nodes you mean your tree-edges are sequences of moves rather than single moves?).
Firstly what you are trying to do sounds like a case based reasoning(CBR) problem see: http://en.wikipedia.org/wiki/Case-based_reasoning#Prominent_CBR_systems . CBR will have a database of decisions, your system would in theory pick the best outcomes available.
Therefore I would suggest using neo4j which is a nosql graph database. http://neo4j.org/
So to represent your game each position is a node in the graph, and each node should contain potential moves from said position. You can track scoring metrics which are learnt as games progress so that the AI is more informed.
I would use a document database (NOSQL) like RavenDB because you can store any data structure in the database.
Documents aren't flat like in a normal SQL database and that allows you to store hierarchical data like trees directly:
{
decision: 'Go forward',
childs: [
{ decision: 'Go backwards' },
{
decision: 'Stay there',
childs: [
{ decision: 'Go backwards' }
]
}
]
}
Here you can see an example JSON tree which can be stored in RavenDB.
RavenDB also has a built-in feature to query hierarchical data: http://ravendb.net/faq/hierarchies
Please look at the documentation to get more information how RavenDB works.
Resources:
- What type of NoSQL database is best suited to store hierarchical data?
You can use memory mapped file as storage. First, create "compiler". This compiler will parse text file and convert it into compact binary representation. Main application will map this binary optimized file into memory. This will solve your problem with memory size limitation
Start with a simple database table design.
Decisions: CurrentState BINARY(57) | NewState BINARY(57) | Score INT
CurrentState and NewState are a serialized version of the game state. Score is a weight given to the NewState (positive scores are good moves, negative scores are bad moves) your AI can update these scores appropriately.
Renju, uses a 15x15 board, each location can be either black, white or empty so you need Ceiling( (2bits * 15*15) / 8 ) bytes to serialize the board. In SQL that would be a BINARY(57) in T-SQL
Your AI would select the current moves it has stored like...
SELECT NewState FROM Decisions WHERE CurrentState = @SerializedState ORDER BY Score DESC
You'll get a list of all the stored next moves from the current game state in order of best score to least score.
Your table structure would have a Composite Unique Index (primary key) on (CurrentState, NewState) to facilitate searching and avoid duplicates.
This isn't the best/most optimal solution, but because of your lack of DB knowledge I beleive it would be the easiest to implement and give you a good start.
If I compare with chess engines, those play from memory, maybe apart from opening libraries. Chess is too complicated to store a decinding decision tree. Chess engines play by assigning heuristic evaluations to potential and transient future positions (not moves). Future positions are found by some kind of limited depth search, may be cached for some time in memory, but often are plainly recalculated each turn as the search space is just too big to store in a way faster to look up than recalculating is possible.
Do you know Chinook — the AI that solves checkers? It does this by compiling a database of every possible endgame. While this is not exactly what you are doing, you might learn from it.
I can't clearly conceive neither the data structures you handle in your tree nor their complexity.
But here are some thoughts which may interest you :
- Map your decision tree into sparse matrix, a tree is a graph after all
- Devise a storage/retrieval strategy taking advantage of sparse matrix properties.
I would approach this with the traditional way an opening book is handled in chess engines:
- Generate all possible moves
- For each move:
- Make that move
- Look the resulting position up in your database
- Undo the move
- Make the move that had the highest score in your database
Looking up a move
Chess engines usually compute a hash function of the current game state via Zobrist hashing, which is a simple way to construct a good hash function for gamestates.
The big advantage of this approach is that it takes care of transpositions, that is, if the same state can be reached via alternate paths, you don't need to worry about those alternate paths, only about the game states themselves.
How chess engines do this
Most chess engines use static opening books that are compiled from recorded games and hence use a simple binary file that maps these hashes to a score; e.g.
struct book_entry {
uint64_t hash
uint32_t score
}
The entries are then sorted by hash, and thanks to operating system caching, a simple binary search through the file will find the needed entries very quickly.
Updating the scores
However, if you want the engine to learn continously, you will need a more complicated data structure; at this point it is usually not worth doing yourself, and you should use an available library. I would probably use LevelDB, but anything that lets you store key-value pairs is fine (Redis, SQLite, GDBM, etc.)
Learning the scores
How exactly you update the scores depends on your game. In games with a lot of data available, a simple approach such as just storing the percentage of games won after the move that resulted in the position works; if you have less data, you can store the result of a game tree search from the position in question as score. Machine learning techniques such as Q learning are also a possibility, although I do not know of a program that actually does this in practice.
I'm assuming your question is asking about how to convert a decision tree into a serial format that can be written to a location and later used to reconstruct the tree.
Try using a pre-order traversal of the tree, using a toString() function (or its equivalent) to convert the data stored at each node of the decision tree to a textual descriptor. By pre-order traversal, I mean implementing an algorithm that first performs the toString() operation on the node, and writes the output to a database or file, and then recursively performs the same operation on its child nodes, in a specified order. Because you are dealing with a sparse tree, your toString() operation should also include information about the existence or non-existence of subtrees.
Reconstructing the tree is simple - the first stored value is the root node, the second is the root member of the left subtree, and so on. The serial data stored for each node should provide information as to which subtree the next inputted node should belong to.
精彩评论