CNNs vs. Transformers: Performance and Robustness in Endoscopic Image Analysis

Carolus H. J. Kusters; Tim G. W. Boers; Tim J. M. Jaspers; Jelmer B. Jukema; Martijn R. Jong; Kiki N. Fockens; Albert J. de Groof; Jacques J. Bergman; Fons van der Sommen; Peter H. N. de With

doi:https://doi.org/10.1007/978-3-031-47076-9_3

CNNs vs. Transformers: Performance and Robustness in Endoscopic Image Analysis

Carolus H. J. Kusters, Tim G. W. Boers, Tim J. M. Jaspers, Jelmer B. Jukema, Martijn R. Jong, Kiki N. Fockens, Albert J. de Groof, Jacques J. Bergman, Fons van der Sommen, Peter H. N. de With

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Academic › peer-review

1 Citation (Scopus)

Abstract

In endoscopy, imaging conditions are often challenging due to organ movement, user dependence, fluctuations in video quality and real-time processing, which pose requirements on the performance, robustness and complexity of computer-based analysis techniques. This paper poses the question whether Transformer-based architectures, which are capable to directly capture global contextual information, can handle the aforementioned endoscopic conditions and even outperform the established Convolutional Neural Networks (CNNs) for this task. To this end, we evaluate and compare clinically relevant performance and robustness of CNNs and Transformers for neoplasia detection in Barrett’s esophagus. We have selected several top performing CNN and Transformers on endoscopic benchmarks, which we have trained and validated on a total of 10,208 images (2,079 patients), and tested on a total of 4,661 images (743 patients), divided over a high-quality test set and three different robustness test sets. Our results show that Transformers generally perform better on classification and segmentation for the high-quality challenging test set, and show on-par or increased robustness to various clinically relevant input data variations, while requiring comparable model complexity. This robustness against challenging video-related conditions and equipment variations over the hospitals is an essential trait for adoption in clinical practice. The code is made publicly available at: https://github.com/BONS-AI-VCA-AMC/Endoscopy-CNNs-vs-Transformers.

Original language	English
Title of host publication	Applications of Medical Artificial Intelligence - 2nd International Workshop, AMAI 2023, Held in Conjunction with MICCAI 2023, Proceedings
Editors	Shandong Wu, Behrouz Shabestari, Lei Xing
Publisher	Springer Science and Business Media Deutschland GmbH
Pages	21-31
Number of pages	11
Volume	14313 LNCS
ISBN (Print)	9783031470752
DOIs	https://doi.org/10.1007/978-3-031-47076-9_3
Publication status	Published - 2024
Event	2nd International Workshop on Applications of Medical Artificial Intelligence, AMAI 2023 - Vancouver, Canada Duration: 8 Oct 2023 → 8 Oct 2023

Publication series

Name	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume	14313 LNCS

Conference

Conference	2nd International Workshop on Applications of Medical Artificial Intelligence, AMAI 2023
Country/Territory	Canada
City	Vancouver
Period	8/10/2023 → 8/10/2023

Keywords

Barrett’s Esophagus
CNN
Robustness
Transformers

Access to Document

https://doi.org/10.1007/978-3-031-47076-9_3

Cite this

Kusters, C. H. J., Boers, T. G. W., Jaspers, T. J. M., Jukema, J. B., Jong, M. R., Fockens, K. N., de Groof, A. J., Bergman, J. J., van der Sommen, F., & de With, P. H. N. (2024). CNNs vs. Transformers: Performance and Robustness in Endoscopic Image Analysis. In S. Wu, B. Shabestari, & L. Xing (Eds.), Applications of Medical Artificial Intelligence - 2nd International Workshop, AMAI 2023, Held in Conjunction with MICCAI 2023, Proceedings (Vol. 14313 LNCS, pp. 21-31). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 14313 LNCS). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-47076-9_3

Kusters, Carolus H. J. ; Boers, Tim G. W. ; Jaspers, Tim J. M. et al. / CNNs vs. Transformers : Performance and Robustness in Endoscopic Image Analysis. Applications of Medical Artificial Intelligence - 2nd International Workshop, AMAI 2023, Held in Conjunction with MICCAI 2023, Proceedings. editor / Shandong Wu ; Behrouz Shabestari ; Lei Xing. Vol. 14313 LNCS Springer Science and Business Media Deutschland GmbH, 2024. pp. 21-31 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).

@inproceedings{4ced7e0f2fd84a368af13902596e6ff7,

title = "CNNs vs. Transformers: Performance and Robustness in Endoscopic Image Analysis",

abstract = "In endoscopy, imaging conditions are often challenging due to organ movement, user dependence, fluctuations in video quality and real-time processing, which pose requirements on the performance, robustness and complexity of computer-based analysis techniques. This paper poses the question whether Transformer-based architectures, which are capable to directly capture global contextual information, can handle the aforementioned endoscopic conditions and even outperform the established Convolutional Neural Networks (CNNs) for this task. To this end, we evaluate and compare clinically relevant performance and robustness of CNNs and Transformers for neoplasia detection in Barrett{\textquoteright}s esophagus. We have selected several top performing CNN and Transformers on endoscopic benchmarks, which we have trained and validated on a total of 10,208 images (2,079 patients), and tested on a total of 4,661 images (743 patients), divided over a high-quality test set and three different robustness test sets. Our results show that Transformers generally perform better on classification and segmentation for the high-quality challenging test set, and show on-par or increased robustness to various clinically relevant input data variations, while requiring comparable model complexity. This robustness against challenging video-related conditions and equipment variations over the hospitals is an essential trait for adoption in clinical practice. The code is made publicly available at: https://github.com/BONS-AI-VCA-AMC/Endoscopy-CNNs-vs-Transformers.",

keywords = "Barrett{\textquoteright}s Esophagus, CNN, Robustness, Transformers",

author = "Kusters, {Carolus H. J.} and Boers, {Tim G. W.} and Jaspers, {Tim J. M.} and Jukema, {Jelmer B.} and Jong, {Martijn R.} and Fockens, {Kiki N.} and {de Groof}, {Albert J.} and Bergman, {Jacques J.} and {van der Sommen}, Fons and {de With}, {Peter H. N.}",

note = "Publisher Copyright: {\textcopyright} 2024, The Author(s), under exclusive license to Springer Nature Switzerland AG.; 2nd International Workshop on Applications of Medical Artificial Intelligence, AMAI 2023 ; Conference date: 08-10-2023 Through 08-10-2023",

year = "2024",

doi = "https://doi.org/10.1007/978-3-031-47076-9_3",

language = "English",

isbn = "9783031470752",

volume = "14313 LNCS",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

publisher = "Springer Science and Business Media Deutschland GmbH",

pages = "21--31",

editor = "Shandong Wu and Behrouz Shabestari and Lei Xing",

booktitle = "Applications of Medical Artificial Intelligence - 2nd International Workshop, AMAI 2023, Held in Conjunction with MICCAI 2023, Proceedings",

address = "Germany",

}

Kusters, CHJ, Boers, TGW, Jaspers, TJM, Jukema, JB , Jong, MR, Fockens, KN, de Groof, AJ , Bergman, JJ, van der Sommen, F & de With, PHN 2024, CNNs vs. Transformers: Performance and Robustness in Endoscopic Image Analysis. in S Wu, B Shabestari & L Xing (eds), Applications of Medical Artificial Intelligence - 2nd International Workshop, AMAI 2023, Held in Conjunction with MICCAI 2023, Proceedings. vol. 14313 LNCS, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 14313 LNCS, Springer Science and Business Media Deutschland GmbH, pp. 21-31, 2nd International Workshop on Applications of Medical Artificial Intelligence, AMAI 2023, Vancouver, Canada, 8/10/2023. https://doi.org/10.1007/978-3-031-47076-9_3

CNNs vs. Transformers: Performance and Robustness in Endoscopic Image Analysis. / Kusters, Carolus H. J.; Boers, Tim G. W.; Jaspers, Tim J. M. et al.
Applications of Medical Artificial Intelligence - 2nd International Workshop, AMAI 2023, Held in Conjunction with MICCAI 2023, Proceedings. ed. / Shandong Wu; Behrouz Shabestari; Lei Xing. Vol. 14313 LNCS Springer Science and Business Media Deutschland GmbH, 2024. p. 21-31 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 14313 LNCS).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Academic › peer-review

TY - GEN

T1 - CNNs vs. Transformers

T2 - 2nd International Workshop on Applications of Medical Artificial Intelligence, AMAI 2023

AU - Kusters, Carolus H. J.

AU - Boers, Tim G. W.

AU - Jaspers, Tim J. M.

AU - Jukema, Jelmer B.

AU - Jong, Martijn R.

AU - Fockens, Kiki N.

AU - de Groof, Albert J.

AU - Bergman, Jacques J.

AU - van der Sommen, Fons

AU - de With, Peter H. N.

PY - 2024

Y1 - 2024

N2 - In endoscopy, imaging conditions are often challenging due to organ movement, user dependence, fluctuations in video quality and real-time processing, which pose requirements on the performance, robustness and complexity of computer-based analysis techniques. This paper poses the question whether Transformer-based architectures, which are capable to directly capture global contextual information, can handle the aforementioned endoscopic conditions and even outperform the established Convolutional Neural Networks (CNNs) for this task. To this end, we evaluate and compare clinically relevant performance and robustness of CNNs and Transformers for neoplasia detection in Barrett’s esophagus. We have selected several top performing CNN and Transformers on endoscopic benchmarks, which we have trained and validated on a total of 10,208 images (2,079 patients), and tested on a total of 4,661 images (743 patients), divided over a high-quality test set and three different robustness test sets. Our results show that Transformers generally perform better on classification and segmentation for the high-quality challenging test set, and show on-par or increased robustness to various clinically relevant input data variations, while requiring comparable model complexity. This robustness against challenging video-related conditions and equipment variations over the hospitals is an essential trait for adoption in clinical practice. The code is made publicly available at: https://github.com/BONS-AI-VCA-AMC/Endoscopy-CNNs-vs-Transformers.

AB - In endoscopy, imaging conditions are often challenging due to organ movement, user dependence, fluctuations in video quality and real-time processing, which pose requirements on the performance, robustness and complexity of computer-based analysis techniques. This paper poses the question whether Transformer-based architectures, which are capable to directly capture global contextual information, can handle the aforementioned endoscopic conditions and even outperform the established Convolutional Neural Networks (CNNs) for this task. To this end, we evaluate and compare clinically relevant performance and robustness of CNNs and Transformers for neoplasia detection in Barrett’s esophagus. We have selected several top performing CNN and Transformers on endoscopic benchmarks, which we have trained and validated on a total of 10,208 images (2,079 patients), and tested on a total of 4,661 images (743 patients), divided over a high-quality test set and three different robustness test sets. Our results show that Transformers generally perform better on classification and segmentation for the high-quality challenging test set, and show on-par or increased robustness to various clinically relevant input data variations, while requiring comparable model complexity. This robustness against challenging video-related conditions and equipment variations over the hospitals is an essential trait for adoption in clinical practice. The code is made publicly available at: https://github.com/BONS-AI-VCA-AMC/Endoscopy-CNNs-vs-Transformers.

KW - Barrett’s Esophagus

KW - CNN

KW - Robustness

KW - Transformers

UR - http://www.scopus.com/inward/record.url?scp=85177230832&partnerID=8YFLogxK

U2 - https://doi.org/10.1007/978-3-031-47076-9_3

DO - https://doi.org/10.1007/978-3-031-47076-9_3

M3 - Conference contribution

SN - 9783031470752

VL - 14313 LNCS

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 21

EP - 31

BT - Applications of Medical Artificial Intelligence - 2nd International Workshop, AMAI 2023, Held in Conjunction with MICCAI 2023, Proceedings

A2 - Wu, Shandong

A2 - Shabestari, Behrouz

A2 - Xing, Lei

PB - Springer Science and Business Media Deutschland GmbH

Y2 - 8 October 2023 through 8 October 2023

ER -

Kusters CHJ, Boers TGW, Jaspers TJM, Jukema JB , Jong MR, Fockens KN et al. CNNs vs. Transformers: Performance and Robustness in Endoscopic Image Analysis. In Wu S, Shabestari B, Xing L, editors, Applications of Medical Artificial Intelligence - 2nd International Workshop, AMAI 2023, Held in Conjunction with MICCAI 2023, Proceedings. Vol. 14313 LNCS. Springer Science and Business Media Deutschland GmbH. 2024. p. 21-31. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: https://doi.org/10.1007/978-3-031-47076-9_3

CNNs vs. Transformers: Performance and Robustness in Endoscopic Image Analysis

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Cite this