How does gradient descent weight/bias update work here?
Clash Royale CLAN TAG#URR8PPP
How does gradient descent weight/bias update work here?
I've been learning neural networks from Michael Nielsen's http://neuralnetworksanddeeplearning.com/chap1.html.
In the section below to update the weights and biases
def update_mini_batch(self, mini_batch, eta):
nabla_b = [np.zeros(b.shape) for b in self.biases]
nabla_w = [np.zeros(w.shape) for w in self.weights]
for x, y in mini_batch:
delta_nabla_b, delta_nabla_w = self.backprop(x, y)
#Zero vectors
nabla_b = [nb+dnb for nb, dnb in zip(nabla_b, delta_nabla_b)]
nabla_w = [nw+dnw for nw, dnw in zip(nabla_w, delta_nabla_w)]
self.weights = [w-(eta/len(mini_batch))*nw
for w, nw in zip(self.weights, nabla_w)]
self.biases = [b-(eta/len(mini_batch))*nb
for b, nb in zip(self.biases, nabla_b)]
def SGD(self, training_data, epochs, mini_batch_size, eta,
test_data=None):
if test_data: n_test = len(test_data)
n = len(training_data)
for j in xrange(epochs):
random.shuffle(training_data)
mini_batches = [
training_data[k:k+mini_batch_size]
for k in xrange(0, n, mini_batch_size)]
####
for mini_batch in mini_batches:
self.update_mini_batch(mini_batch, eta)
if test_data:
print "Epoch 0: 1 / 2".format(
j, self.evaluate(test_data), n_test)
else:
print "Epoch 0 complete".format(j)
what is the need to introduce nabla_b and nabla_w zero vectors? when they're simply being added to the dnb and dnw which are numpy arrays themselves. Isn't 0 + something = something. What is the need for zero vector here for a single training example?
As a test I removed the zero vector and had dnb and dnw by itself and I failed to see any significant difference in the training.
Thank you.
1 Answer
1
Yes, you are right 0 + something = something
, but in the second iteration, it will be
0 + something = something
something +something_else = value
So, this happens in the following codefor x, y in mini_batch:
for x, y in mini_batch:
Here, for the first minibatch
nabla_w
,nabla_b
will be 0, but for the second and later iterations, it will have some value.
minibatch
nabla_w
nabla_b
lets consider the following code
nabla_b = [nb+dnb for nb, dnb in zip(nabla_b, delta_nabla_b)]
nabla_w = [nw+dnw for nw, dnw in zip(nabla_w, delta_nabla_w)]
in the first iteration both nabla_b
and nabla_w
are zero's.
but, in this iteration, these are updated because of nb+dnb
and so, nabla_b and nabla_w are no longer just vectors with just zeros. so, in the second iteration, nabla_b is no longer a zero vector
nabla_b
nabla_w
nb+dnb
sorry?, your doubt was about the use of zero vectors right?
– InAFlash
Aug 10 at 3:38
I understand now so according to that a mini batch looks like say """ [ [V1, 3], [V2, 4] , [V3, 1] ] """ .. how does for loop work in this ? for x, y ? Wouldn't that make more sense in something like [ V1, 3 ] etc?
– Shashanoid
Aug 10 at 3:46
mini batch will be exactly as you said, but, its in for loop, so, in each iteration we will be getting one value of it. in the case of first iteration it will be [V1,3]. but, there it was kept as x,y in mini_batch, which will make, x=v1 and y=3 in the first iteration. the values simply unpack.
– InAFlash
Aug 10 at 3:53
Damn I just realized. I feel extremely stupid. Thank you very much !
– Shashanoid
Aug 10 at 3:58
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
I understand, here x, y = [ vector, value ] and update_mini_batch gets called like for mini_batch in mini_batches: self.update_mini_batch(mini_batch) For every training example, it initiates the zero vector.
– Shashanoid
Aug 10 at 3:36