Detection of frame informativeness in endoscopic videos using image quality and recurrent neural networks

T. G. W. Boers, J. van der Putten, J. de Groof, M. Struyvenberg, K. Fockens, W. Curvers, E. Schoon, F. van der Sommen, J. Bergman, P. H. N. de With

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

3 Citations (Scopus)

Abstract

Gastroenterologists are estimated to misdiagnose up to 25% of esophageal adenocarcinomas in Barrett's Esophagus patients. This prompts the need for more sensitive and objective tools to aid clinicians with lesion detection. Artificial Intelligence (AI) can make examinations more objective and will therefore help to mitigate the observer dependency. Since these models are trained with good-quality endoscopic video frames to attain high efficacy, high-quality images are also needed for inference. Therefore, we aim to develop a framework that is able to distinguish good image quality by a-priori informativeness classification which leads to high inference robustness. We show that we can maintain informativeness over the temporal domain using recurrent neural networks, yielding a higher performance on non-informativeness detection compared to classifying individual images. Furthermore, it is also found that by using Gradient weighted Class Activation Map (Grad-CAM), we can better localize informativeness within a frame. We have developed a customized Resnet18 feature extractor with 3 classifiers, consisting of a Fully-Connected (FC), Long-Short-Term-Memory (LSTM) and a Gated-Recurrent-Unit (GRU) classifier. Experimental results are based on 4,349 frames from 20 pullback videos of the esophagus. Our results demonstrate that the algorithm achieves comparative performance with the current state-of-the-art. The FC and LSTM classifier reach an F1 score of 91% and 91%. We found that the LSTM classifier based Grad-CAMs represent the origin of non-informativeness the best as 85% of the images were found to be highlighting the correct area. The benefit of our novel implementation for endoscopic informativeness classification is that it is trained end-to-end, incorporates the spatio-temporal domain in the decision making for robustness, and makes the model decisions of the model insightful with the use of Grad-CAMs.
Original languageEnglish
Title of host publicationMedical Imaging 2020
Subtitle of host publicationImage Processing
EditorsIvana Isgum, Bennett A. Landman
PublisherSPIE
Volume11313
ISBN (Electronic)9781510633933
DOIs
Publication statusPublished - 2020
EventMedical Imaging 2020: Image Processing - Houston, United States
Duration: 17 Feb 202020 Feb 2020

Publication series

NameProgress in Biomedical Optics and Imaging - Proceedings of SPIE
Volume11313

Conference

ConferenceMedical Imaging 2020: Image Processing
Country/TerritoryUnited States
CityHouston
Period17/02/202020/02/2020

Keywords

  • Computer-Aided Diagnosis
  • Endoscopy
  • Esophageal Cancer
  • Grad-CAM
  • Recurrent Neural Networks
  • Video Frame Classification

Cite this