Dice face value recognition
I’m trying to build a simple application that will recognize the values of two 6-sided dice. I’m looking for some general pointers, or maybe even an open source project.
The two dice will be black and white, with white and black pips respectively. Their distance to the camera will always be the same, but their position and orientation on the playing surface will be random.
Dice http://www.freeimagehosting.net/uploads/9160bdd073.jpg
(not the best example, the surface will be a more distinct color and the shadows will be gone)I have no prior experience with developing this kind of recognition software, but I would assume the trick is to first isolate the faces by searching for the square profile with a dominating white or black color (the rest o开发者_开发技巧f the image, i.e. the table/playing surface, will in distinctly different colors), and then isolate the pips for the count. Shadows will be eliminated by top down lighting.
I’m hoping the described scenario is so simple (read: common) it may even be used as an “introductory exercise” for developers working on OCR technologies or similar computer vision challenges.
Update:
I did some further googling and came across this video which strangely enough is exactly what I'm looking for. It also seems it's the OpenCV project is my best bet so far, I'll try and use it with this other project, OpenCVDotNet or Emgu CV.
Update:
Still struggling, can't get Emgu CV to work.Ideas, pointers, thoughts, etc are still very much welcome!
While image training is "non-trivial" as @Brian said, that will actually be a pretty easy program to write. What you need to do is develop haar classifiers for the dice. You will need 6 classifiers total. The classifiers are the key to good image recongnition, and haar classifiers are the best there are right now. They take a long time to make. Here are some good links to get you familiarized with haar cascades:
http://www.computer-vision-software.com/blog/2009/11/faq-opencv-haartraining/
http://www.cognotics.com/opencv/docs/1.0/haartraining.htm
http://note.sonots.com/SciSoftware/haartraining.html
Check out this guys youtube video and then download his source from the link he provides in the video to see how he applied the cascade files in EmguCV. It will be something for you to build on.
http://www.youtube.com/watch?v=07QAhRJmcKQ
This site posts the link to some source for nice little tool that adds a little automation to cropping the images and creating the index files needed for the creation of the haar cascades. I used it a few months back and I couldn't get it to work right, but I modified it and it worked great for haar (not HMM). If you want the version I modified post back and I will get it to you.
http://sandarenu.blogspot.com/2009/03/opencv-haar-training-resources.html
While I have little technical assistance to offer you, the maker of the Dice-O-Matic mark II may be able to help.
Alright,
Algorithms for carrying out image recognition with a high level of abstraction (like the type of abstraction necessary to produce reliable handwriting recognition software or face recognition software) persists as one of the most difficult problems in computer science today. However, pattern recognition for well constrained applications, like the application you described, is a solvable and very fun algorithmic problem.
I would suggest two possible strategies for carrying out your task:
The first strategy involves using some third party software that can preprocess your image and return data about low-level image components. I have some experience using a software called pixcavator, that has an SDK here. Pixavator will mine through your image and study the discrepancy between the color values of each of the pixels to return the borders of various components in the image. A software like pixcavator should be able to easily define the boundaries for the comopents in your picture and most importantly each of the pips. Your job will then be to mine through the data that the third party software returns to you and look for components that fit the description of small circular partitions that are either white or black. You'll be able to count up how many of these image components were partitioned off and use that to return the amount of pips in your image.
If you're ambitious enough to work on this problem without the use of third party software, the problem is still solvable. Essentially, you'll want to define a circular scanner which is a set of pixels in a circular formation that will scan through your image testing looking for a pip (just like an eye might scan over a picture to look for something hidden in the picture). As your algorithmic “eye” is scanning over the image, it will be taking sets of pixels from the image (call it test sets) and comparing with a predefined set of pixels (what we'll call your training sets) and checking to see if the test set matches one of the training sets within a predefined tolerance for error. The easiest way to run a test like this would be to simply compare the color data for each of the pixels in the test set with each of the pixels in the training set to produce a third set of pixels called your discrepancy set. If the values in your discrepancy set are sufficiently small (meaning the test set is sufficiently similar to the training set) you'll define that area on your image as a pip and move on to scan other parts of your image.
It will take a little guess and check to find the right error tolerance so that you catch every pip and you don't test positive for things that aren't pips.
Image recognition is non-trivial. You're going to have to constrain the input data in some way, and it looks like you've given this some thought.
Your question reminded me of a blog post by the author of SudokuGrab, which is an iPhone app that allows you to take photos of a Sudoku puzzle in a newspaper, and have it solve the puzzle for you. In the post, he discusses several of the issues that you will face in solving your problem, and how he overcame them.
This is a smiliar question to Object Recognition from Templates to which I provided an answer which I believe might be of use.
While different kinds of classifiers will probably work well, I would probably try the method I outlined first. Classifiers are often tricky to implement and especially to train properly. Also, when things don't work it is very hard to know where the problem is: is it in your implementation of the classifier, did you choose the wrong method, are the parameters wrong, did you not train it properly, or were you just unlucky?
No, stay away from classifiers, template matching and neural networks if the problem can (easily) be solved using simlpe image processing methods and some math.
Another possibility is first using a more generic image manipulation/recognition algorithm to pin down the dice positions, then rotate and scale the image to some form of standard (such as, 512x512 pixel grayscale images of dice which have been rotated to be straight). Then attempt to train 6 different neural nets to recognize the various numbers of dice on screen. AForge.Net is a good solid artificial intelligence (including neural nets) library, and should get you a fair bit of the way there.
In this video you can see pretty much the behaviour you want, I think. The author is using multiple white dice, but he is providing the code (python/opencv) and maybe you can build your project on that.
精彩评论