Train softmax layer after adding to intermediate layers of the (pretrained/non-trained) neural network in keras
Clash Royale CLAN TAG#URR8PPP
Train softmax layer after adding to intermediate layers of the (pretrained/non-trained) neural network in keras
I want to train 2 models in keras for cifar10 dataset. First, from scratch (model1) and second by fine-tuning a pre-trained model (model2). I use the following codes to do that:
from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D
from keras.layers import Activation, Dropout, Flatten, Dense
import numpy as np
import os
from keras.models import load_model
#model 1
input_shape = (32, 32, 3)
model1 = Sequential()
model1.add(Conv2D(32, kernel_size=(3, 3),activation='relu',input_shape=input_shape))
model1.add(Conv2D(64, (3, 3), activation='relu'))
model1.add(MaxPooling2D(pool_size=(2, 2)))
model1.add(Dropout(0.25))
model1.add(Flatten())
model1.add(Dense(128, activation='relu'))
model1.add(Dropout(0.5))
model1.add(Dense(10, activation='softmax'))
#... training
#model 2
kmodel = load_model('cifar10\cifar10.h5')
model2=Sequential()
for i in range (len(kmodel.layers)):
model2.add(kmodel.layers[i])
I want to know that:
In model 1:
How can I add softmax
layer (model1.add(Dense(10, activation='softmax'))
) after some intermediate layers in order that for each of these new softmax
layers, I have connection just with previous layer and no connection with next layer?
softmax
model1.add(Dense(10, activation='softmax'))
softmax
In model 2:
How can I add softmax
layer to the intermediate layers (i.e. layer #2, #4, #7) which I also have the connections with above condition? (Of course I should freeze all of the kmodel
layers and just train the new softmax
layers)
softmax
kmodel
softmax
@dennlinger Because I want to have a prediction for each of intermediate layers for a specific input. Naturally the output of layer #2 is input of one of the softmax layer, output of layer #4 is input of the other softmax layer and so on. I want to see which layers predict right or from which layer, the prediction will be right.
– morteza ali ahmadi
Aug 7 at 6:06
So basically you want to split your outputs of layer #2 to serve as 1) the input of layer #3, and 2) the input of a softmax layer at that point?
– dennlinger
Aug 7 at 6:09
@dennlinger Yes, you are right.
– morteza ali ahmadi
Aug 7 at 6:10
@dennlinger But share is better than split
– morteza ali ahmadi
Aug 7 at 6:11
1 Answer
1
The limitation here is the Sequential()
operator of Keras, which allows you to only linearly stack layers.
Sequential()
In order to circumvent this, we can simply specify the model by a more direct (but uglier) way, as described here. It would look something like this for your code:
input_shape = (32, 32, 3)
x = Conv2D(32, kernel_size=(3, 3),activation='relu',input_shape=input_shape)(inputs)
x = Conv2D(64, kernel_size=(3, 3),activation='relu',input_shape=input_shape)(x)
...
predictions = Dense(10, activation='softmax')(x)
You can then simply specify your predictions in the intermediary layers as
input_shape = (32, 32, 3)
x = Conv2D(32, kernel_size=(3, 3),activation='relu',input_shape=input_shape)(inputs)
x = Conv2D(64, kernel_size=(3, 3),activation='relu',input_shape=input_shape)(x)
...
# do the splitting at some random point of your choice
predictions_intermediary = Dense(10, activation='softmax')(x)
# regular next layer
x = Dense(128, activation='relu')(x)
predictions = Dense(10, activation='softmax')(x)
I am sadly not familiar enough with Keras to tell you how it would work for a pretrained model, but I am assuming that you can somehow define the pretrained model similarly, and then specifying trainable layers as in the previous example.
Note that your "sharing vs splitting" question is obsolete here, since creating a different layer/operation will automatically create a different weight matrix, so you do not have to worry about having shared weights here (which in any case could not work, if you had different input dimensions in the next input layer, compared to the softmax input shape).
Thank you very much, So is the training procedure like the sequential model? Can you add the train procedure example in your code?
– morteza ali ahmadi
Aug 7 at 6:29
The training procedure should be the same, from what I know about Keras. As I said, I am not really an expert with Keras, so that is as much as I can provide. Furthermore, you have to draw up your own training procedure anyways, since you are essentially trying to do several backwards passes anyways, from what I understand, unless you average over your different softmax layers in the end, and then simply backpropagate "normally".
– dennlinger
Aug 7 at 6:31
OK, If I want to average over my different softmax layers at the end to train normally, how can I do that?
– morteza ali ahmadi
Aug 7 at 6:39
Now let me test your code, and after that I will accept your answer, thanks again.
– morteza ali ahmadi
Aug 7 at 6:44
See here for averaging.
– dennlinger
Aug 7 at 7:18
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
Why would you want to use softmax after intermediary layers? And what is the input to the following layers then? And what exactly is your (mathematical) motivation for using softmax from different layers?
– dennlinger
Aug 7 at 6:01