Capsules for Object Segmentation

Rodney LaLonde |Ulas Bagci

In the last few years, deep learning-based methods, in particular Convolutional Neural Networks (CNNs), have become the state-of-the-art for various image analysis tasks.  Specifically related to the object segmentation problem, U-Net, Fully Convolutional Networks (FCN) have become the most used models for various medical image segmentation tasks.

This paper proposes a new architecture based in the Capsule Network presented by  S. Sabour,  G. E. Hinton et. al in 2017.

Capsule Networks

In this type of network, each layer is divided into many small groups of neurons called “capsules” and each node in the parse tree will correspond to an active capsule. This network uses an iterative routing process where each active capsule will choose a capsule in the layer above to be its parent in the tree. For the higher levels of a visual system, this iterative process will be solving the problem of assigning parts to whole regardless of their orientation perspective point of view.

The output of each capsule is a vector, being this one of the main features which allows the application of a dynamic routing mechanism which ensures that the output of the capsule gets sent to an appropriate parent in the layer above.

The original capsules were designed for object classification for the MNIST dataset and have been also tested in some others like the CIFAR10 and MNIST fashion datasets. In the case of this paper, the authors modified the architecture to perform object segmentation and they compare their result with the LUNA16 dataset which is a dataset composed of Lung cancer CT scans and it is composed by a total of 888 CT scans.

The authors a 4-fold cross-validation, removed 10 images and choose only the class with tumor as positive.

All networks were trained from scratch, using the same data augmentation methods (scale, flip, shift, rotate, elastic deformations, and random noise) and Adam optimization with an initial learning rate of 0.00001. A batch size of 1 was chosen for all experiments to match the original U-Net implementation.

Three different CapsNets were trained where dynamic routing is only performed on layers which change spatial dimensions, and the other where the other layers are routed with equal-weight coupling coefficients and a third base-line CapsNet three-layer baseline capsule segmentation network


An advantage mentioned by the authors is that this type of architectures requires considerably fewer parameters than other state-of-the-art models. However, the dynamic routing algorithm makes it ver memory consuming. Still, it achieves slightly better performances than other models, but this needs to be verified in other datasets and studies.

However, the idea seems feasible and innovative and given the lack of studies using this architecture it is an open opportunity to keep investigating the advantages and limits of this model.