Save PyML.classifiers.multi.OneAgainstRest(SVM()) object?
I'm using PYML to construct a multiclass linear support vector machine (SVM). After training the SVM, I would like to be able to save the classifier, so that on subsequent runs I can use the classifier right away without retraining. Unfortunately, the .save() function is not implemented for that classifier, and attempting to pickle it (both with standard pickle and cPickle) yield the following error message:
pickle.PicklingError: Can't pickle : it's not found as __builtin__.PySwigObject
Does anyone know of a way around this or of an alternative library without this problem? Thanks.
Edit/Update
I am now training and attempting to save the classifier with the following code:mc = multi.OneAgainstRest(SVM()); mc.train(dataset_pyml,saveSpace=False); for i, classifier in enumerate(mc.classifiers): filename=os.path.join(prefix,labels[i]+".svm"); classifier.save(filename);
Notice that I am now saving with the PyML save mechanism rather than with pickling, and that I have passed "saveSpace=False" to the training function. However, I am still gettting an error:
ValueError: in order to save a dataset you need to train as: s.train(data, saveSpace = False)
However, I am passing saveSpac开发者_StackOverflowe=False... so, how do I save the classifier(s)?
P.S.
The project I am using this in is pyimgattr, in case you would like a complete testable example... the program is run with "./pyimgattr.py train"... that will get you this error. Also, a note on version information:[michaelsafyan@codemage /Volumes/Storage/classes/cse559/pyimgattr]$ python Python 2.6.1 (r261:67515, Feb 11 2010, 00:51:29) [GCC 4.2.1 (Apple Inc. build 5646)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import PyML >>> print PyML.__version__ 0.7.0
In multi.py on line 96 "self.classifiers[i].train(datai)" is called without passing "**args", so that if you call "mc.train(data, saveSpace=False)", this saveSpace-Argument gets lost. This is why you get an error message if you try to save the classifiers in your multiclass-classifier individually. But if you change this line to pass all arguments, you can save each classifier individually:
#!/usr/bin/python
import numpy
from PyML.utils import misc
from PyML.evaluators import assess
from PyML.classifiers.svm import SVM, loadSVM
from PyML.containers.labels import oneAgainstRest
from PyML.classifiers.baseClassifiers import Classifier
from PyML.containers.vectorDatasets import SparseDataSet
from PyML.classifiers.composite import CompositeClassifier
class OneAgainstRestFixed(CompositeClassifier) :
'''A one-against-the-rest multi-class classifier'''
def train(self, data, **args) :
'''train k classifiers'''
Classifier.train(self, data, **args)
numClasses = self.labels.numClasses
if numClasses <= 2:
raise ValueError, 'Not a multi class problem'
self.classifiers = [self.classifier.__class__(self.classifier)
for i in range(numClasses)]
for i in range(numClasses) :
# make a copy of the data; this is done in case the classifier modifies the data
datai = data.__class__(data, deepcopy = self.classifier.deepcopy)
datai = oneAgainstRest(datai, data.labels.classLabels[i])
self.classifiers[i].train(datai, **args)
self.log.trainingTime = self.getTrainingTime()
def classify(self, data, i):
r = numpy.zeros(self.labels.numClasses, numpy.float_)
for j in range(self.labels.numClasses) :
r[j] = self.classifiers[j].decisionFunc(data, i)
return numpy.argmax(r), numpy.max(r)
def preproject(self, data) :
for i in range(self.labels.numClasses) :
self.classifiers[i].preproject(data)
test = assess.test
train_data = """
0 1:1.0 2:0.0 3:0.0 4:0.0
0 1:0.9 2:0.0 3:0.0 4:0.0
1 1:0.0 2:1.0 3:0.0 4:0.0
1 1:0.0 2:0.8 3:0.0 4:0.0
2 1:0.0 2:0.0 3:1.0 4:0.0
2 1:0.0 2:0.0 3:0.9 4:0.0
3 1:0.0 2:0.0 3:0.0 4:1.0
3 1:0.0 2:0.0 3:0.0 4:0.9
"""
file("foo_train.data", "w").write(train_data.lstrip())
test_data = """
0 1:1.1 2:0.0 3:0.0 4:0.0
1 1:0.0 2:1.2 3:0.0 4:0.0
2 1:0.0 2:0.0 3:0.6 4:0.0
3 1:0.0 2:0.0 3:0.0 4:1.4
"""
file("foo_test.data", "w").write(test_data.lstrip())
train = SparseDataSet("foo_train.data")
mc = OneAgainstRestFixed(SVM())
mc.train(train, saveSpace=False)
test = SparseDataSet("foo_test.data")
print [mc.classify(test, i) for i in range(4)]
for i, classifier in enumerate(mc.classifiers):
classifier.save("foo.model.%d" % i)
classifiers = []
for i in range(4):
classifiers.append(loadSVM("foo.model.%d" % i))
mcnew = OneAgainstRestFixed(SVM())
mcnew.labels = misc.Container()
mcnew.labels.addAttributes(test.labels, ['numClasses', 'classLabels'])
mcnew.classifiers = classifiers
print [mcnew.classify(test, i) for i in range(4)]
Get a newer version of PyML. Since version 0.7.4, it is possible to save the OneAgainstRest classifier (with .save() and .load()); prior to that version, saving/loading the classifier is non-trivial and error-prone.
精彩评论