Is Capsule Network really rotationally invariant in practice?

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP



Is Capsule Network really rotationally invariant in practice?



Capsule network is said to perform well under rotation..??*



I trained a Capsule Network with (train-dataset) to get train-accuracy ~100%..



i tested the network with the (test-dataset-original) to get test-accuracy ~99%



i rotated the (test-dataset-original) by 0.5 (test-dataset-rotate0p5) and



1 degrees to get (test-dataset-rotate1) and got the test-accuracy of just ~10%



i used the network from this repo as a seed https://github.com/naturomics/CapsNet-Tensorflow




3 Answers
3



The first layer of a capsule network is normal convolution. The filters here are not rotation invariant, only the output feature maps are applied a pose matrix by the primary capsule layer.



I think this is why you also need to show the capsnet rotated images. But much fewer than for normal convnets.





This does not provide an answer to the question. Once you have sufficient reputation you will be able to comment on any post; instead, provide answers that don't require clarification from the asker. - From Review
– Azat Ibrakov
Jul 6 at 13:23



Capsule networks encapsule vectors or 4x4 matrices in a neural network. However, matrices can be used for many things, rotations being just one of them. There's no way the network can know that you want to use the encapsuled representation for rotations, except if you specifically show it, so it can learn to use this for rotations..



Capsule Networks came into existence to solve the problem of viewpoint variance problem in convolutional neural networks (CNNs). CapsNet is said to be viewpoint invariant that includes rotational and translational invariance.



CNNs have translational invariance by using max-pooling but that results in information loss in the receptive field. And as the network goes deeper, the receptive field also increases gradually and hence max-pooling in deeper layers cause more information loss. This results in loss of the spatial information and only local/temporal information is learned by the network. CNNs fail to learn the bigger picture of the input.



The weights Wij (between primary and secondary capsule layer) are backpropagated to learn the affine transformation on the entity represented by the ith capsule in primary layer and make a predicted vector uj|i. So basically this Wij is responsible for learning rotational transformations also for a given entity.






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

Firebase Auth - with Email and Password - Check user already registered

Dynamically update html content plain JS

How to determine optimal route across keyboard