DeepSMILE: Contrastive self-supervised pre-training benefits MSI and HRD classification directly from H&E whole-slide images in colorectal and breast cancer

Yoni Schirris; Efstratios Gavves; Iris Nederlof; Hugo Mark Horlings; Jonas Teuwen

doi:https://doi.org/10.1016/j.media.2022.102464

DeepSMILE: Contrastive self-supervised pre-training benefits MSI and HRD classification directly from H&E whole-slide images in colorectal and breast cancer

Yoni Schirris, Efstratios Gavves, Iris Nederlof, Hugo Mark Horlings, Jonas Teuwen

Research output: Contribution to journal › Article › Academic › peer-review

37 Citations (Scopus)

Abstract

We propose a Deep learning-based weak label learning method for analyzing whole slide images (WSIs) of Hematoxylin and Eosin (H&E) stained tumor tissue not requiring pixel-level or tile-level annotations using Self-supervised pre-training and heterogeneity-aware deep Multiple Instance LEarning (DeepSMILE). We apply DeepSMILE to the task of Homologous recombination deficiency (HRD) and microsatellite instability (MSI) prediction. We utilize contrastive self-supervised learning to pre-train a feature extractor on histopathology tiles of cancer tissue. Additionally, we use variability-aware deep multiple instance learning to learn the tile feature aggregation function while modeling tumor heterogeneity. For MSI prediction in a tumor-annotated and color normalized subset of TCGA-CRC (n=360 patients), contrastive self-supervised learning improves the tile supervision baseline from 0.77 to 0.87 AUROC, on par with our proposed DeepSMILE method. On TCGA-BC (n=1041 patients) without any manual annotations, DeepSMILE improves HRD classification performance from 0.77 to 0.81 AUROC compared to tile supervision with either a self-supervised or ImageNet pre-trained feature extractor. Our proposed methods reach the baseline performance using only 40% of the labeled data on both datasets. These improvements suggest we can use standard self-supervised learning techniques combined with multiple instance learning in the histopathology domain to improve genomic label classification performance with fewer labeled data.

Original language	English
Article number	102464
Pages (from-to)	102464
Number of pages	1
Journal	Medical Image Analysis
Volume	79
Early online date	29 Apr 2022
DOIs	https://doi.org/10.1016/j.media.2022.102464
Publication status	Published - 1 Jul 2022

Keywords

Computational pathology
Histogenomics
Multiple instance learning
Self-supervised learning

Access to Document

https://doi.org/10.1016/j.media.2022.102464

https://pure.uva.nl/ws/files/107966541/1_s2.0_S1361841522001116_main.pdf

Cite this

@article{af7eb100deef414c83a42c343acdec59,

title = "DeepSMILE: Contrastive self-supervised pre-training benefits MSI and HRD classification directly from H&E whole-slide images in colorectal and breast cancer",

abstract = "We propose a Deep learning-based weak label learning method for analyzing whole slide images (WSIs) of Hematoxylin and Eosin (H&E) stained tumor tissue not requiring pixel-level or tile-level annotations using Self-supervised pre-training and heterogeneity-aware deep Multiple Instance LEarning (DeepSMILE). We apply DeepSMILE to the task of Homologous recombination deficiency (HRD) and microsatellite instability (MSI) prediction. We utilize contrastive self-supervised learning to pre-train a feature extractor on histopathology tiles of cancer tissue. Additionally, we use variability-aware deep multiple instance learning to learn the tile feature aggregation function while modeling tumor heterogeneity. For MSI prediction in a tumor-annotated and color normalized subset of TCGA-CRC (n=360 patients), contrastive self-supervised learning improves the tile supervision baseline from 0.77 to 0.87 AUROC, on par with our proposed DeepSMILE method. On TCGA-BC (n=1041 patients) without any manual annotations, DeepSMILE improves HRD classification performance from 0.77 to 0.81 AUROC compared to tile supervision with either a self-supervised or ImageNet pre-trained feature extractor. Our proposed methods reach the baseline performance using only 40% of the labeled data on both datasets. These improvements suggest we can use standard self-supervised learning techniques combined with multiple instance learning in the histopathology domain to improve genomic label classification performance with fewer labeled data.",

keywords = "Computational pathology, Histogenomics, Multiple instance learning, Self-supervised learning",

author = "Yoni Schirris and Efstratios Gavves and Iris Nederlof and Horlings, {Hugo Mark} and Jonas Teuwen",

note = "Funding Information: The collaboration project is co-funded by the PPP Allowance made available by Health Holland1, Top Sector Life Sciences & Health, to stimulate public-private partnerships. Publisher Copyright: {\textcopyright} 2022 Elsevier B.V.",

year = "2022",

month = jul,

day = "1",

doi = "https://doi.org/10.1016/j.media.2022.102464",

language = "English",

volume = "79",

pages = "102464",

journal = "Medical Image Analysis",

issn = "1361-8415",

publisher = "Elsevier",

}

TY - JOUR

T1 - DeepSMILE

T2 - Contrastive self-supervised pre-training benefits MSI and HRD classification directly from H&E whole-slide images in colorectal and breast cancer

AU - Schirris, Yoni

AU - Gavves, Efstratios

AU - Nederlof, Iris

AU - Horlings, Hugo Mark

AU - Teuwen, Jonas

N1 - Funding Information: The collaboration project is co-funded by the PPP Allowance made available by Health Holland1, Top Sector Life Sciences & Health, to stimulate public-private partnerships. Publisher Copyright: © 2022 Elsevier B.V.

PY - 2022/7/1

Y1 - 2022/7/1

N2 - We propose a Deep learning-based weak label learning method for analyzing whole slide images (WSIs) of Hematoxylin and Eosin (H&E) stained tumor tissue not requiring pixel-level or tile-level annotations using Self-supervised pre-training and heterogeneity-aware deep Multiple Instance LEarning (DeepSMILE). We apply DeepSMILE to the task of Homologous recombination deficiency (HRD) and microsatellite instability (MSI) prediction. We utilize contrastive self-supervised learning to pre-train a feature extractor on histopathology tiles of cancer tissue. Additionally, we use variability-aware deep multiple instance learning to learn the tile feature aggregation function while modeling tumor heterogeneity. For MSI prediction in a tumor-annotated and color normalized subset of TCGA-CRC (n=360 patients), contrastive self-supervised learning improves the tile supervision baseline from 0.77 to 0.87 AUROC, on par with our proposed DeepSMILE method. On TCGA-BC (n=1041 patients) without any manual annotations, DeepSMILE improves HRD classification performance from 0.77 to 0.81 AUROC compared to tile supervision with either a self-supervised or ImageNet pre-trained feature extractor. Our proposed methods reach the baseline performance using only 40% of the labeled data on both datasets. These improvements suggest we can use standard self-supervised learning techniques combined with multiple instance learning in the histopathology domain to improve genomic label classification performance with fewer labeled data.

AB - We propose a Deep learning-based weak label learning method for analyzing whole slide images (WSIs) of Hematoxylin and Eosin (H&E) stained tumor tissue not requiring pixel-level or tile-level annotations using Self-supervised pre-training and heterogeneity-aware deep Multiple Instance LEarning (DeepSMILE). We apply DeepSMILE to the task of Homologous recombination deficiency (HRD) and microsatellite instability (MSI) prediction. We utilize contrastive self-supervised learning to pre-train a feature extractor on histopathology tiles of cancer tissue. Additionally, we use variability-aware deep multiple instance learning to learn the tile feature aggregation function while modeling tumor heterogeneity. For MSI prediction in a tumor-annotated and color normalized subset of TCGA-CRC (n=360 patients), contrastive self-supervised learning improves the tile supervision baseline from 0.77 to 0.87 AUROC, on par with our proposed DeepSMILE method. On TCGA-BC (n=1041 patients) without any manual annotations, DeepSMILE improves HRD classification performance from 0.77 to 0.81 AUROC compared to tile supervision with either a self-supervised or ImageNet pre-trained feature extractor. Our proposed methods reach the baseline performance using only 40% of the labeled data on both datasets. These improvements suggest we can use standard self-supervised learning techniques combined with multiple instance learning in the histopathology domain to improve genomic label classification performance with fewer labeled data.

KW - Computational pathology

KW - Histogenomics

KW - Multiple instance learning

KW - Self-supervised learning

UR - http://www.scopus.com/inward/record.url?scp=85131223827&partnerID=8YFLogxK

UR - https://pure.uva.nl/ws/files/107966534/1_s2.0_S1361841522001116_mmc1.csv

UR - https://pure.uva.nl/ws/files/107966536/1_s2.0_S1361841522001116_mmc2.csv

UR - https://pure.uva.nl/ws/files/107966538/1_s2.0_S1361841522001116_mmc3.pdf

U2 - https://doi.org/10.1016/j.media.2022.102464

DO - https://doi.org/10.1016/j.media.2022.102464

M3 - Article

C2 - 35596966

SN - 1361-8415

VL - 79

SP - 102464

JO - Medical Image Analysis

JF - Medical Image Analysis

M1 - 102464

ER -

DeepSMILE: Contrastive self-supervised pre-training benefits MSI and HRD classification directly from H&E whole-slide images in colorectal and breast cancer

Abstract

Keywords

Access to Document

Other files and links

Cite this