need some suggestions on my SVM feature refinement
I've trained a system on SVM,that is given a question,whether the webpage is a good one for answering this quest开发者_JAVA技巧ion.
The feature I selected are "Term frequency in webpage","Whether term matches with the webpage title", "number of images in the webpage", "length of the webpage","is it a wikipedia page?","the position of this webpage in the list returned by the search engine".
Currently,my system will maintain a precision around 0.4 and recall at 1.It has a large portion of false positive error(that many bad links were classified as good link by my classifier).
Since the accuracy could be improved a bit,I would like to ask for some help here on considering refine the features that I selected for training/testing,could remove some or adding more in there.
Thanks in advance.
Hmm...
- How large is your training set? i.e., how many training documents are you using?
- What is your test set composed of?
- Since you're getting too many FPs, I would try training with more (and varied) "bad" webpages
- Can you give more details about your different features, like "tf in webpage," etc.?
精彩评论