Parallel deep neural networks for endoscopic OCT image segmentation

DAWEI LI| JIMIN WU| YUFAN HE|XINWEN YAO| WU YUAN|DEFU CHEN|HYEON-CHEOL PARK| SHAOYONG YU| JERRY L.PRINCE|XINGDE LI
Biomedical Optics Express, Vol. 10, Issue 3, pp. 1126-1135(2019)
DOI: https://doi.org/10.1364/BOE.10.001126

This article reports parallel-trained deep neural networks for automated endoscopic OCT image segmentation feasible even with a limited training data set. These U-Net-based deep neural networks were trained using a modified dice loss function and manual segmentation of ultrahigh-resolution cross-sectional images collected by an 800 nm OCT endoscopic system. The method was tested on in vivo guinea pig esophagus images. Results showed its robust layer segmentation capability with a boundary error of 1.4 μm insensitive to lay topology disorders. The method was also applied to differentiating in vivo OCT esophagus images from an eosinophilic esophagitis (EOE) model and its control group, and the results demonstrated quantitative changes in the top esophageal layers’ thickness in the EOE model.

Parallel Networks Training for OCT Segmentation

The parallel-trained deep neural networks contained three U-Nets. The images and corresponding ground truth in the training data set were first divided along the lateral direction into eight non-overlapping sets of smaller images (termed slices) and then spatial augmented as the input for the U-Net. The net parameters were initialized randomly, following a normal distribution. The output of the U-Net was the prediction of the esophageal layers. The prediction was compared with the corresponding manual segmentation with a selected loss function. The output of the loss function was used to update the U-Net parameters. The training process was repeated until the loss function reached its minimum. The trained net was then used for automated image segmentation.

During the training process, the randomly initialized net parameters were updated for  layer  prediction by minimizing a loss function.  Weighted multi-class dice loss function was selected to evaluate the difference between the prediction and manual segmentation.

Spatial augmentation such as horizontal flipping, translating, and cropping can enlarge the training data set and has served as a standard step in the training stage. Two more networks was also  trained separately by the original training data set added with different levels of zero-mean Gaussian noises.

Discussion

This paper proposes solution for segmenting five layers (number predefined) of upper digest tract. Two main strategies (Small window training;  and the fusion of three same nets trained on different data-set) are used.

For endoscopic OCT images collected from other disease models or human subjects with disrupted layer structures such as Barrett’s esophagus, the networks need to be re-trained and tested with relevant images. The computational cost of this new method is also investigated, although the speed is expected to be improved, this method would be very attractive for future real-time layer segmentation and tracking in various clinical applications.