machine learning - svm feature fusion techique
for my final thesis i am trying to build up an 3d face recognition system by combining color and depth information. the first step i did, is to realign the data-head to an given model-head using the iterative closest point algorithm. for the detection step i was thinking about using the libsvm. but i dont understand how to combine the depth and the color information to one feature vector? they are dependent information (each point consist of color (RGB), depth information and also scan quality).. what do you suggest to do? something like weighting?
edit: last night i read a开发者_C百科n article about SURF/SIFT features i would like to use them! could it work? the concept would be the following: extracting this features out of the color image and the depth image (range image), using each feature as a single feature vector for the svm?
Concatenation is indeed a possibility. However, as you are working on 3d face recognition you should have some strategy as to how you go about it. Rotation and translation of faces will be hard to recognize using a "straightforward" approach.
You should decide whether you attempt to perform a detection of the face as a whole, or of sub-features. You could attempt to detect rotation by finding some core features (eyes, nose, etc).
Also, remember that SVMs are inherently binary (i.e. they separate between two classes). Depending on your exact application you will very likely have to employ some multi-class strategy (One-against-all or One-against-many).
I would recommend doing some literature research to see how others have attacked the problem (a google search will be a good start).
It sounds simple, but you can simply concatenate the two vectors into one. Many researchers do this.
What you arrived at is an important open problem. Yes, there are some ways to handle it, as mentioned here by Eamorr. For example you can concatenate and do PCA (or some non linear dimensionality reduction method). But it is kind of hard to defend the practicality of doing so, considering that PCA takes O(n^3) time in the number of features. This alone might be unreasonable for data in vision that may have thousands of features.
As mentioned by others, the easiest approach is to simply combine the two sets of features into one.
SVM is characterized by the normal to the maximum-margin hyperplane, where its components specify the weights/importance of the features, such that higher absolute values have a larger impact on the decision function. Thus SVM assigns weights to each feature all on its own.
In order for this to work, obviously you would have to normalize all the attributes to have the same scale (say transform all features to be in the range [-1,1] or [0,1])
精彩评论