How does gradient descent weight/bias update work here?

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP



How does gradient descent weight/bias update work here?



I've been learning neural networks from Michael Nielsen's http://neuralnetworksanddeeplearning.com/chap1.html.



In the section below to update the weights and biases


def update_mini_batch(self, mini_batch, eta):
nabla_b = [np.zeros(b.shape) for b in self.biases]
nabla_w = [np.zeros(w.shape) for w in self.weights]

for x, y in mini_batch:
delta_nabla_b, delta_nabla_w = self.backprop(x, y)

#Zero vectors
nabla_b = [nb+dnb for nb, dnb in zip(nabla_b, delta_nabla_b)]
nabla_w = [nw+dnw for nw, dnw in zip(nabla_w, delta_nabla_w)]

self.weights = [w-(eta/len(mini_batch))*nw
for w, nw in zip(self.weights, nabla_w)]
self.biases = [b-(eta/len(mini_batch))*nb
for b, nb in zip(self.biases, nabla_b)]



def SGD(self, training_data, epochs, mini_batch_size, eta,
test_data=None):
if test_data: n_test = len(test_data)
n = len(training_data)
for j in xrange(epochs):
random.shuffle(training_data)
mini_batches = [
training_data[k:k+mini_batch_size]
for k in xrange(0, n, mini_batch_size)]

####
for mini_batch in mini_batches:
self.update_mini_batch(mini_batch, eta)
if test_data:
print "Epoch 0: 1 / 2".format(
j, self.evaluate(test_data), n_test)
else:
print "Epoch 0 complete".format(j)



what is the need to introduce nabla_b and nabla_w zero vectors? when they're simply being added to the dnb and dnw which are numpy arrays themselves. Isn't 0 + something = something. What is the need for zero vector here for a single training example?



As a test I removed the zero vector and had dnb and dnw by itself and I failed to see any significant difference in the training.



Thank you.




1 Answer
1



Yes, you are right 0 + something = something, but in the second iteration, it will be


0 + something = something


something +something_else = value



So, this happens in the following code
for x, y in mini_batch:


for x, y in mini_batch:



Here, for the first minibatch nabla_w,nabla_b will be 0, but for the second and later iterations, it will have some value.


minibatch


nabla_w


nabla_b



lets consider the following code


nabla_b = [nb+dnb for nb, dnb in zip(nabla_b, delta_nabla_b)]
nabla_w = [nw+dnw for nw, dnw in zip(nabla_w, delta_nabla_w)]



in the first iteration both nabla_b and nabla_w are zero's.
but, in this iteration, these are updated because of nb+dnb and so, nabla_b and nabla_w are no longer just vectors with just zeros. so, in the second iteration, nabla_b is no longer a zero vector


nabla_b


nabla_w


nb+dnb





I understand, here x, y = [ vector, value ] and update_mini_batch gets called like for mini_batch in mini_batches: self.update_mini_batch(mini_batch) For every training example, it initiates the zero vector.
– Shashanoid
Aug 10 at 3:36






sorry?, your doubt was about the use of zero vectors right?
– InAFlash
Aug 10 at 3:38





I understand now so according to that a mini batch looks like say """ [ [V1, 3], [V2, 4] , [V3, 1] ] """ .. how does for loop work in this ? for x, y ? Wouldn't that make more sense in something like [ V1, 3 ] etc?
– Shashanoid
Aug 10 at 3:46






mini batch will be exactly as you said, but, its in for loop, so, in each iteration we will be getting one value of it. in the case of first iteration it will be [V1,3]. but, there it was kept as x,y in mini_batch, which will make, x=v1 and y=3 in the first iteration. the values simply unpack.
– InAFlash
Aug 10 at 3:53





Damn I just realized. I feel extremely stupid. Thank you very much !
– Shashanoid
Aug 10 at 3:58






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

Firebase Auth - with Email and Password - Check user already registered

Dynamically update html content plain JS

How to determine optimal route across keyboard