RCNN Implementation
I am trying to implement RCNN Paper from scratch. As proposed in the paper I have successfully extracted the region proposals using selective search. The next step is to train a feature extractor which is basically a (N+1) class classifier, where N is number of classes in the data and 1 for the background.
As suggested in the paper, I am using AlexNet (with ImageNet pretrained weights) for feature extractor but facing issue while training this. The training loss and accuracy are coming out as expected but it's not the case during validation. The validation accuracy is going down while validation loss is going up. Below is the snippet I am using for loss and accuracy calculations
for imgs, labels in tepoch:
imgs = imgs.to(self.device)
labels = labels.to(self.device)
outputs = model(imgs)
_, preds = torch.max(outputs, 1)
loss = criterion(outputs, labels)
if phase == "train":
model.zero_grad()
loss.backward()
optimizer.step()
lr_scheduler.step()
step_loss = loss.item()
step_acc = torch.sum(preds == labels.data) / imgs.shape[0]
One might think that the model is overfitting but I beg to differ. The dataset is pretty huge and I am also using augmentation, to give a perspective:
Number of positive samples: 517565
Number of negative samples: 4436934\Augmentations:
transformation = A.Compose(
[
A.ChannelShuffle(p=0.15),
A.RandomBrightnessContrast(p=0.2),
A.HueSaturationValue(p=0.2),
A.HorizontalFlip(p=0.5),
A.CLAHE(p=0.3),
A.Sharpen(p=0.3),
A.Resize(height=224, width=224, always_apply=True, p=1),
A.Normalize(always_apply=True, p=1),
ToTensorV2()
]
)
The paper suggests to use a batch size of 128 (96 negative samples and 32 positive samples). It did seem counter intuitive to me to use small number of positive samples but I tried to justify it after observing the training process:
Since the negative samples are more (around 75% of the batch), the model in the starting epochs just predicts everything as 0 (class of negative samples)...which simply gives an accuracy of 75% Using small number of positive samples then force the model to learn the representation of other classes.
开发者_如何学PythonThis is just my hypothesis and open to discussion.
Another point of discussion is the region proposal generation. The paper says that they generated about 2000 region proposal for each image and the threshold used for categorizing positive samples from negative samples leads to generation of much more negative samples than positive samples...which will eventually affect the batch.
Any help with training the feature extractor would be appreciated. I have been trying to train the model for weeks now...HELP!!!
精彩评论