Deep learning-based classification of kidney transplant pathology: a retrospective, multicentre, proof-of-concept study

Jesper Kers; Roman D Bülow; Barbara M Klinkhammer; Gerben E Breimer; Francesco Fontana; Adeyemi Adefidipe Abiola; Rianne Hofstraat; Garry L Corthals; Hessel Peters-Sengers; Sonja Djudjaj; Saskia von Stillfried; David L Hölscher; Tobias T Pieters; Arjan D van Zuilen; Frederike J Bemelman; Azam S Nurmohamed; Maarten Naesens; Joris J T H Roelofs; Sandrine Florquin; Jürgen Floege; Tri Q Nguyen; Jakob N Kather; Peter Boor

doi:https://doi.org/10.1016/S2589-7500(21)00211-9

Deep learning-based classification of kidney transplant pathology: a retrospective, multicentre, proof-of-concept study

Jesper Kers, Roman D Bülow, Barbara M Klinkhammer, Gerben E Breimer, Francesco Fontana, Adeyemi Adefidipe Abiola, Rianne Hofstraat, Garry L Corthals, Hessel Peters-Sengers, Sonja Djudjaj, Saskia von Stillfried, David L Hölscher, Tobias T Pieters, Arjan D van Zuilen, Frederike J Bemelman, Azam S Nurmohamed, Maarten Naesens, Joris J T H Roelofs, Sandrine Florquin, Jürgen FloegeTri Q Nguyen, Jakob N Kather, Peter Boor

Research output: Contribution to journal › Article › Academic › peer-review

41 Citations (Scopus)

Abstract

BACKGROUND: Histopathological assessment of transplant biopsies is currently the standard method to diagnose allograft rejection and can help guide patient management, but it is one of the most challenging areas of pathology, requiring considerable expertise, time, and effort. We aimed to analyse the utility of deep learning to preclassify histology of kidney allograft biopsies into three main broad categories (ie, normal, rejection, and other diseases) as a potential biopsy triage system focusing on transplant rejection.

METHODS: We performed a retrospective, multicentre, proof-of-concept study using 5844 digital whole slide images of kidney allograft biopsies from 1948 patients. Kidney allograft biopsy samples were identified by a database search in the Departments of Pathology of the Amsterdam UMC, Amsterdam, Netherlands (1130 patients) and the University Medical Center Utrecht, Utrecht, Netherlands (717 patients). 101 consecutive kidney transplant biopsies were identified in the archive of the Institute of Pathology, RWTH Aachen University Hospital, Aachen, Germany. Convolutional neural networks (CNNs) were trained to classify allograft biopsies as normal, rejection, or other diseases. Three times cross-validation (1847 patients) and deployment on an external real-world cohort (101 patients) were used for validation. Area under the receiver operating characteristic curve (AUROC) was used as the main performance metric (the primary endpoint to assess CNN performance).

FINDINGS: Serial CNNs, first classifying kidney allograft biopsies as normal (AUROC 0·87 [ten times bootstrapped CI 0·85-0·88]) and disease (0·87 [0·86-0·88]), followed by a second CNN classifying biopsies classified as disease into rejection (0·75 [0·73-0·76]) and other diseases (0·75 [0·72-0·77]), showed similar AUROC in cross-validation and deployment on independent real-world data (first CNN normal AUROC 0·83 [0·80-0·85], disease 0·83 [0·73-0·91]; second CNN rejection 0·61 [0·51-0·70], other diseases 0·61 [0·50-0·74]). A single CNN classifying biopsies as normal, rejection, or other diseases showed similar performance in cross-validation (normal AUROC 0·80 [0·73-0·84], rejection 0·76 [0·66-0·80], other diseases 0·50 [0·36-0·57]) and generalised well for normal and rejection classes in the real-world data. Visualisation techniques highlighted rejection-relevant areas of biopsies in the tubulointerstitium.

INTERPRETATION: This study showed that deep learning-based classification of transplant biopsies could support pathological diagnostics of kidney allograft rejection.

FUNDING: European Research Council; German Research Foundation; German Federal Ministries of Education and Research, Health, and Economic Affairs and Energy; Dutch Kidney Foundation; Human(e) AI Research Priority Area of the University of Amsterdam; and Max-Eder Programme of German Cancer Aid.

Original language	English
Pages (from-to)	e18-e26
Journal	The Lancet Digital Health
Volume	4
Issue number	1
Early online date	2021
DOIs	https://doi.org/10.1016/S2589-7500(21)00211-9
Publication status	Published - Jan 2022

Access to Document

https://doi.org/10.1016/S2589-7500(21)00211-9

https://pure.uva.nl/ws/files/127557991/Deep_learning_based_classification_of_kidney_transplant_pathology.pdfLicence: CC BY-NC-ND

Cite this

Kers, J., Bülow, R. D., Klinkhammer, B. M., Breimer, G. E., Fontana, F., Abiola, A. A., Hofstraat, R., Corthals, G. L., Peters-Sengers, H., Djudjaj, S., von Stillfried, S., Hölscher, D. L., Pieters, T. T., van Zuilen, A. D., Bemelman, F. J., Nurmohamed, A. S., Naesens, M., Roelofs, J. J. T. H., Florquin, S., ... Boor, P. (2022). Deep learning-based classification of kidney transplant pathology: a retrospective, multicentre, proof-of-concept study. The Lancet Digital Health, 4(1), e18-e26. https://doi.org/10.1016/S2589-7500(21)00211-9

@article{b77c8285b401429a994f09e50fc0f18a,

title = "Deep learning-based classification of kidney transplant pathology: a retrospective, multicentre, proof-of-concept study",

abstract = "BACKGROUND: Histopathological assessment of transplant biopsies is currently the standard method to diagnose allograft rejection and can help guide patient management, but it is one of the most challenging areas of pathology, requiring considerable expertise, time, and effort. We aimed to analyse the utility of deep learning to preclassify histology of kidney allograft biopsies into three main broad categories (ie, normal, rejection, and other diseases) as a potential biopsy triage system focusing on transplant rejection.METHODS: We performed a retrospective, multicentre, proof-of-concept study using 5844 digital whole slide images of kidney allograft biopsies from 1948 patients. Kidney allograft biopsy samples were identified by a database search in the Departments of Pathology of the Amsterdam UMC, Amsterdam, Netherlands (1130 patients) and the University Medical Center Utrecht, Utrecht, Netherlands (717 patients). 101 consecutive kidney transplant biopsies were identified in the archive of the Institute of Pathology, RWTH Aachen University Hospital, Aachen, Germany. Convolutional neural networks (CNNs) were trained to classify allograft biopsies as normal, rejection, or other diseases. Three times cross-validation (1847 patients) and deployment on an external real-world cohort (101 patients) were used for validation. Area under the receiver operating characteristic curve (AUROC) was used as the main performance metric (the primary endpoint to assess CNN performance).FINDINGS: Serial CNNs, first classifying kidney allograft biopsies as normal (AUROC 0·87 [ten times bootstrapped CI 0·85-0·88]) and disease (0·87 [0·86-0·88]), followed by a second CNN classifying biopsies classified as disease into rejection (0·75 [0·73-0·76]) and other diseases (0·75 [0·72-0·77]), showed similar AUROC in cross-validation and deployment on independent real-world data (first CNN normal AUROC 0·83 [0·80-0·85], disease 0·83 [0·73-0·91]; second CNN rejection 0·61 [0·51-0·70], other diseases 0·61 [0·50-0·74]). A single CNN classifying biopsies as normal, rejection, or other diseases showed similar performance in cross-validation (normal AUROC 0·80 [0·73-0·84], rejection 0·76 [0·66-0·80], other diseases 0·50 [0·36-0·57]) and generalised well for normal and rejection classes in the real-world data. Visualisation techniques highlighted rejection-relevant areas of biopsies in the tubulointerstitium.INTERPRETATION: This study showed that deep learning-based classification of transplant biopsies could support pathological diagnostics of kidney allograft rejection.FUNDING: European Research Council; German Research Foundation; German Federal Ministries of Education and Research, Health, and Economic Affairs and Energy; Dutch Kidney Foundation; Human(e) AI Research Priority Area of the University of Amsterdam; and Max-Eder Programme of German Cancer Aid.",

author = "Jesper Kers and B{\"u}low, {Roman D} and Klinkhammer, {Barbara M} and Breimer, {Gerben E} and Francesco Fontana and Abiola, {Adeyemi Adefidipe} and Rianne Hofstraat and Corthals, {Garry L} and Hessel Peters-Sengers and Sonja Djudjaj and {von Stillfried}, Saskia and H{\"o}lscher, {David L} and Pieters, {Tobias T} and {van Zuilen}, {Arjan D} and Bemelman, {Frederike J} and Nurmohamed, {Azam S} and Maarten Naesens and Roelofs, {Joris J T H} and Sandrine Florquin and J{\"u}rgen Floege and Nguyen, {Tri Q} and Kather, {Jakob N} and Peter Boor",

note = "Funding Information: This study was funded by the German Research Foundation (Project numbers 322900939 , 454024652 , 432698239 to PB, 432698239 to SD), the European Research Council (Consolidator Grant AIM.imaging.CKD, number 101001791 to PB), the German Federal Ministries of Education and Research (STOP-FSGS-01GM1901A to PB and SD), Health and Economic Affairs and Energy (DEEP LIVER number ZMVI1-2520DAT111 to PB and JNK; EMPAIA number 01MK2002A to PB), the Dutch Kidney Foundation (17OKG23: DEEPGRAFT project to JK), the Human(e) AI Research Priority Area of the University of Amsterdam (to JK), and the Max-Eder Programme of German Cancer Aid (grant number 70113864 to JNK). Publisher Copyright: {\textcopyright} 2022 The Author(s). Published by Elsevier Ltd. This is an Open Access article under the CC BY-NC-ND 4.0 license",

year = "2022",

month = jan,

doi = "https://doi.org/10.1016/S2589-7500(21)00211-9",

language = "English",

volume = "4",

pages = "e18--e26",

journal = "The Lancet Digital Health",

issn = "2589-7500",

publisher = "Elsevier Ltd",

number = "1",

}

Kers, J, Bülow, RD, Klinkhammer, BM, Breimer, GE, Fontana, F, Abiola, AA, Hofstraat, R, Corthals, GL, Peters-Sengers, H, Djudjaj, S, von Stillfried, S, Hölscher, DL, Pieters, TT, van Zuilen, AD, Bemelman, FJ , Nurmohamed, AS, Naesens, M, Roelofs, JJTH , Florquin, S, Floege, J, Nguyen, TQ, Kather, JN & Boor, P 2022, 'Deep learning-based classification of kidney transplant pathology: a retrospective, multicentre, proof-of-concept study', The Lancet Digital Health, vol. 4, no. 1, pp. e18-e26. https://doi.org/10.1016/S2589-7500(21)00211-9

TY - JOUR

T1 - Deep learning-based classification of kidney transplant pathology

T2 - a retrospective, multicentre, proof-of-concept study

AU - Kers, Jesper

AU - Bülow, Roman D

AU - Klinkhammer, Barbara M

AU - Breimer, Gerben E

AU - Fontana, Francesco

AU - Abiola, Adeyemi Adefidipe

AU - Hofstraat, Rianne

AU - Corthals, Garry L

AU - Peters-Sengers, Hessel

AU - Djudjaj, Sonja

AU - von Stillfried, Saskia

AU - Hölscher, David L

AU - Pieters, Tobias T

AU - van Zuilen, Arjan D

AU - Bemelman, Frederike J

AU - Nurmohamed, Azam S

AU - Naesens, Maarten

AU - Roelofs, Joris J T H

AU - Florquin, Sandrine

AU - Floege, Jürgen

AU - Nguyen, Tri Q

AU - Kather, Jakob N

AU - Boor, Peter

N1 - Funding Information: This study was funded by the German Research Foundation (Project numbers 322900939 , 454024652 , 432698239 to PB, 432698239 to SD), the European Research Council (Consolidator Grant AIM.imaging.CKD, number 101001791 to PB), the German Federal Ministries of Education and Research (STOP-FSGS-01GM1901A to PB and SD), Health and Economic Affairs and Energy (DEEP LIVER number ZMVI1-2520DAT111 to PB and JNK; EMPAIA number 01MK2002A to PB), the Dutch Kidney Foundation (17OKG23: DEEPGRAFT project to JK), the Human(e) AI Research Priority Area of the University of Amsterdam (to JK), and the Max-Eder Programme of German Cancer Aid (grant number 70113864 to JNK). Publisher Copyright: © 2022 The Author(s). Published by Elsevier Ltd. This is an Open Access article under the CC BY-NC-ND 4.0 license

PY - 2022/1

Y1 - 2022/1

N2 - BACKGROUND: Histopathological assessment of transplant biopsies is currently the standard method to diagnose allograft rejection and can help guide patient management, but it is one of the most challenging areas of pathology, requiring considerable expertise, time, and effort. We aimed to analyse the utility of deep learning to preclassify histology of kidney allograft biopsies into three main broad categories (ie, normal, rejection, and other diseases) as a potential biopsy triage system focusing on transplant rejection.METHODS: We performed a retrospective, multicentre, proof-of-concept study using 5844 digital whole slide images of kidney allograft biopsies from 1948 patients. Kidney allograft biopsy samples were identified by a database search in the Departments of Pathology of the Amsterdam UMC, Amsterdam, Netherlands (1130 patients) and the University Medical Center Utrecht, Utrecht, Netherlands (717 patients). 101 consecutive kidney transplant biopsies were identified in the archive of the Institute of Pathology, RWTH Aachen University Hospital, Aachen, Germany. Convolutional neural networks (CNNs) were trained to classify allograft biopsies as normal, rejection, or other diseases. Three times cross-validation (1847 patients) and deployment on an external real-world cohort (101 patients) were used for validation. Area under the receiver operating characteristic curve (AUROC) was used as the main performance metric (the primary endpoint to assess CNN performance).FINDINGS: Serial CNNs, first classifying kidney allograft biopsies as normal (AUROC 0·87 [ten times bootstrapped CI 0·85-0·88]) and disease (0·87 [0·86-0·88]), followed by a second CNN classifying biopsies classified as disease into rejection (0·75 [0·73-0·76]) and other diseases (0·75 [0·72-0·77]), showed similar AUROC in cross-validation and deployment on independent real-world data (first CNN normal AUROC 0·83 [0·80-0·85], disease 0·83 [0·73-0·91]; second CNN rejection 0·61 [0·51-0·70], other diseases 0·61 [0·50-0·74]). A single CNN classifying biopsies as normal, rejection, or other diseases showed similar performance in cross-validation (normal AUROC 0·80 [0·73-0·84], rejection 0·76 [0·66-0·80], other diseases 0·50 [0·36-0·57]) and generalised well for normal and rejection classes in the real-world data. Visualisation techniques highlighted rejection-relevant areas of biopsies in the tubulointerstitium.INTERPRETATION: This study showed that deep learning-based classification of transplant biopsies could support pathological diagnostics of kidney allograft rejection.FUNDING: European Research Council; German Research Foundation; German Federal Ministries of Education and Research, Health, and Economic Affairs and Energy; Dutch Kidney Foundation; Human(e) AI Research Priority Area of the University of Amsterdam; and Max-Eder Programme of German Cancer Aid.

AB - BACKGROUND: Histopathological assessment of transplant biopsies is currently the standard method to diagnose allograft rejection and can help guide patient management, but it is one of the most challenging areas of pathology, requiring considerable expertise, time, and effort. We aimed to analyse the utility of deep learning to preclassify histology of kidney allograft biopsies into three main broad categories (ie, normal, rejection, and other diseases) as a potential biopsy triage system focusing on transplant rejection.METHODS: We performed a retrospective, multicentre, proof-of-concept study using 5844 digital whole slide images of kidney allograft biopsies from 1948 patients. Kidney allograft biopsy samples were identified by a database search in the Departments of Pathology of the Amsterdam UMC, Amsterdam, Netherlands (1130 patients) and the University Medical Center Utrecht, Utrecht, Netherlands (717 patients). 101 consecutive kidney transplant biopsies were identified in the archive of the Institute of Pathology, RWTH Aachen University Hospital, Aachen, Germany. Convolutional neural networks (CNNs) were trained to classify allograft biopsies as normal, rejection, or other diseases. Three times cross-validation (1847 patients) and deployment on an external real-world cohort (101 patients) were used for validation. Area under the receiver operating characteristic curve (AUROC) was used as the main performance metric (the primary endpoint to assess CNN performance).FINDINGS: Serial CNNs, first classifying kidney allograft biopsies as normal (AUROC 0·87 [ten times bootstrapped CI 0·85-0·88]) and disease (0·87 [0·86-0·88]), followed by a second CNN classifying biopsies classified as disease into rejection (0·75 [0·73-0·76]) and other diseases (0·75 [0·72-0·77]), showed similar AUROC in cross-validation and deployment on independent real-world data (first CNN normal AUROC 0·83 [0·80-0·85], disease 0·83 [0·73-0·91]; second CNN rejection 0·61 [0·51-0·70], other diseases 0·61 [0·50-0·74]). A single CNN classifying biopsies as normal, rejection, or other diseases showed similar performance in cross-validation (normal AUROC 0·80 [0·73-0·84], rejection 0·76 [0·66-0·80], other diseases 0·50 [0·36-0·57]) and generalised well for normal and rejection classes in the real-world data. Visualisation techniques highlighted rejection-relevant areas of biopsies in the tubulointerstitium.INTERPRETATION: This study showed that deep learning-based classification of transplant biopsies could support pathological diagnostics of kidney allograft rejection.FUNDING: European Research Council; German Research Foundation; German Federal Ministries of Education and Research, Health, and Economic Affairs and Energy; Dutch Kidney Foundation; Human(e) AI Research Priority Area of the University of Amsterdam; and Max-Eder Programme of German Cancer Aid.

UR - http://www.scopus.com/inward/record.url?scp=85120858490&partnerID=8YFLogxK

UR - https://pure.uva.nl/ws/files/127557989/Supplementary_appendix_Deep_learning_based_classification_of_kidney_transplant_pathology.pdf

U2 - https://doi.org/10.1016/S2589-7500(21)00211-9

DO - https://doi.org/10.1016/S2589-7500(21)00211-9

M3 - Article

C2 - 34794930

SN - 2589-7500

VL - 4

SP - e18-e26

JO - The Lancet Digital Health

JF - The Lancet Digital Health

IS - 1

ER -

Deep learning-based classification of kidney transplant pathology: a retrospective, multicentre, proof-of-concept study

Abstract

Access to Document

Other files and links

Cite this