Backpropagation issues
I have a couple of questions about how to code the backpropagation algorithm of neural networks:
The topology of my networks is an input layer, hidden layer and output layer. Both the hidden layer and output layer have sigmoid functions.
- First of all, should I use the bias? To where should I connect the bias in my network? Should I put one bias unit per layer in both the hidden layer and output layer? What about the input layer?
- In this link, they define the last delta as the input - output and they backpropagate the deltas as can be seen in the figure. They hold a table to put all the deltas before actually propagating the errors in a feedforward fashion. Is this a departure from the standard backpropagation algorithm?
- Should I decrease the learning 开发者_如何学运维factor over time?
- In case anyone knows, is Resilient Propagation an online or batch learning technique?
Thanks
edit: One more thing. In the following picture, d f1(e) / de, assuming I'm using the sigmoid function, is f1(e) * [1- f1(e)], right?
It varies. Personally, I don't see much of a reason for bias, but I haven't studied NN enough to actually make a valid case for or against them. I'd try it out to and test results.
That's correct. Backpropagation involves calculation of deltas first, and then propagating them across the network.
Yes. Learning factor should be decreased over time. However, with BP, you can hit local, incorrect plateaus, so sometimes around the 500th iteration, it makes sense to reset the learning factor to the intial rate.
I can't answer that.....never heard anything about RP.
Your question needs to be specified a bit more thoroughly... What is your need? Generalization or memorization? Are you anticipating a complex pattern matching data set, or a continuous-domain input-output relationship? Here are my $0.02:
I would suggest you leave a bias neuron in just in case you need it. If it is deemed unnecessary by the NN, training should drive the weights to negligible values. It will connect to every neuron in the layer up ahead, but is not connected to from any neuron in the preceding layer.
The equation looks like standard backprop as far as I can tell.
It is hard to generalize whether your learning rate needs to be decreased over time. The behaviour is highly data-dependent. The smaller your learning rate, the more stable your training will be. However, it can be painfully slow, especially if you're running it in a scripting language like I did once upon a time.
Resilient backprop (or RProp in MATLAB) should handle both online and batch training modes.
I'd just like to add that you might want to consider alternative activation functions if possible. The sigmoid function doesn't always give the best results...
精彩评论