Algorithm to choose random letters for word search game that allows many words to be spelled
I'm making a boggle-like word game. The user is given a grid of letters like this:
O V Z W X
S T A C K
Y R F L Q
The user picks out a word using any adjacent chains of letters, like the word "STACK" across the middle line. The letters used are then replaced by the machine e.g. (new letters in lowercase):
O V Z W X
z e x o p
Y R F L Q
Notice you can now spell "OVeRFLoW" by using the new letters. My problem is: What algorithm can I use to pick new letters that maximizes the number of long words the user can spell? I want the game to be fun and involve spelling e.g. 6 letter words sometimes but, if you pick bad lett开发者_高级运维ers, games involve the user just spelling 3 letter words and not getting a chance to find larger words.
For example:
You could just randomly pick new letters from the alphabet. This does not work well.
Likewise, I found picking randomly but using the letter frequencies from Scrabble didn't work well. This works better in Scrabble I think as you are less constrained about the order you use the letters in.
I tried having a set of lists, each representing one of the dies from the Boggle game, and each letter would be picked from a random die side (I also wonder whether I can legally use this data in a product). I didn't notice this working well. I imagine the Boggle dice sides were chosen in some sensible manner, but I cannot find how this was done.
Some ideas I've considered:
Make a table of how often letter pairs occur together in the dictionary. For the sake of argument, say E is seen next to A 30% of the time. When picking a new letter, I would randomly pick a letter based on the frequency of this letter occurring next to a randomly chosen adjacent letter on the grid. For example, if the neighboring letter was E, the new letter would be "A" 30% of the time. The should mean there are lots of decent pairs to use scattered around the map. I could maybe improve this by making probability tables of a letter occurring between two other letters.
Somehow do a search for what words can be spelt on the current grid, taking the new letters to be wildcards. I would then replace the wildcards with letters that allowed the biggest words to be spelt. I'm not sure how you would do this efficiently however.
Any other ideas are appreciated. I wonder if there is a common way to solve this problem and what other word games use.
Edit: Thanks for the great answers so far! I forgot to mention, I'm really aiming for low memory/cpu requirements if possible, I'm probably going to use the SOWPODS dictionary (about 250,000) and my grid will be able 6 x 6.
Here's a simple method:
Write a fast solver for the game using the same word list that the player will use. Generate say 100 different possible boards at random (using letter frequencies is probably a good idea here, but not essential). For each board calculate all the words that can be generated and score the board based on the number of words found or the count weighted by word length (i.e. the total sum of word lengths of all words found). Then just pick the best scoring board from the 100 possibilities and give that to the player.
Also instead of always picking the highest scoring board (i.e. the easiest board) you could have different score thresholds to make the game more difficult for experts.
A minor variation on the letter-pair approach: use the frequency of letter pairs in long words - say 6 letters or longer - since that's your objective. You could also develop a weighting that included all adjacent letters, not just a random one.
This wordgame I slapped up a while back, which behaves very similarly to what you describe, uses English frequency tables to select letters, but decides first whether to generate a vowel or consonant, allowing me to ensure a given rate of vowels on the board. This seems to work reasonably well.
You should look up n-gramming, and Markovian Models.
Your first idea is very losely related to Markovian algorithms. Basically, if you have a large text corpus, say of 1000 words. What you can do is analyse each letter and create a table to know the probability of a certain letter following the current letter.
For example, I know that the letter Q from my 1000 words ( 4000 letters in total ) is used only 40 times. Then I calculate what probable letters follow using my markov hash table.
For example, QU happens 100% of the time so I know that should Q be randomly chosen by your application that I need to make sure that the letter U is also included. Then, the letter "I" is used 50% of the time, and "A" 25% of the times and "O" 25% of the time.
Its actually really complicated to explain and I bet there are other explainations out there which are much better then this.
But the idea is that given a legitmately large text corpus you can create a chain of X letters which are probably consistent with English language and thus should be easy for users to make words out of. You can choose to look forward on a value of n-gram, the highest the number the easier you could make your game. For example, an n-gram of two would probably make it very hard to create words over 6, but an n-gram of 4 would be very easy.
The Wikipedia explains it really badly, so I wouldn't follow that.
Take a look at this Markov generator:
http://www.haykranen.nl/projects/markov/demo/
I do not know about a precanned algorithm for this, but...
There is a dictionary file in UNIX, and I imagine there is something similar available on other platforms (maybe even in the java libraries? - google it). Anyways, use the files the spell checker uses.
After they spell a word an it drops out, you have existing letters and blank spaces.
1) From each existing letter, go right, left, up, down (you will need to understand recursive algorithms). As long as the string you have built so far is found at the start of words or backwards from the end of words in the dictionary file, continue. When you come across a blank space, count the frequency of the letters you need next. Use the most frequent letters.
It will not guarantee a word as you have not checked the corresponding ending or beginning, but I think it would be much easier to implement than an exhaustive search and get pretty good results.
I think this will get you a step closer to your destination: http://en.wikipedia.org/wiki/Levenshtein_distance
You might look at this Java implementation of the Jumble algorithm to find sets of letters that permute to multiple dictionary words:
$ java -jar dist/jumble.jar | sort -nr | head 11 Orang Ronga angor argon goran grano groan nagor orang organ rogan 10 Elaps Lepas Pales lapse salep saple sepal slape spale speal 9 ester estre reest reset steer stere stree terse tsere 9 caret carte cater crate creat creta react recta trace 9 Easter Eastre asteer easter reseat saeter seater staree teaser 9 Canari Carian Crania acinar arnica canari carina crania narica 8 leapt palet patel pelta petal plate pleat tepal 8 laster lastre rastle relast resalt salter slater stelar 8 Trias arist astir sitar stair stria tarsi tisar 8 Trema armet mater metra ramet tamer terma trame ...
精彩评论