Metrics reloaded: recommendations for image analysis validation

Lena Maier-Hein; Annika Reinke; Patrick Godau; Minu D. Tizabi; Florian Buettner; Evangelia Christodoulou; Ben Glocker; Fabian Isensee; Jens Kleesiek; Michal Kozubek; Mauricio Reyes; Michael A. Riegler; Manuel Wiesenfarth; A. Emre Kavur; Carole H. Sudre; Michael Baumgartner; Matthias Eisenmann; Doreen Heckmann-Nötzel; Tim Rädsch; Laura Acion; Michela Antonelli; Tal Arbel; Spyridon Bakas; Arriel Benis; Matthew B. Blaschko; M. Jorge Cardoso; Veronika Cheplygina; Beth A. Cimini; Gary S. Collins; Keyvan Farahani; Luciana Ferrer; Adrian Galdran; Bram van Ginneken; Robert Haase; Daniel A. Hashimoto; Michael M. Hoffman; Merel Huisman; Pierre Jannin; Charles E. Kahn; Dagmar Kainmueller; Bernhard Kainz; Alexandros Karargyris; Alan Karthikesalingam; Florian Kofler; Annette Kopp-Schneider; Anna Kreshuk; Tahsin Kurc; Bennett A. Landman; Geert Litjens; Amin Madani; Klaus Maier-Hein; Anne L. Martel; Peter Mattson; Erik Meijering; Bjoern Menze; Karel G. M. Moons; Henning Müller; Brennan Nichyporuk; Felix Nickel; Jens Petersen; Nasir Rajpoot; Nicola Rieke; Julio Saez-Rodriguez; Clara I. Sánchez; Shravya Shetty; Maarten van Smeden; Ronald M. Summers; Abdel A. Taha; Aleksei Tiulpin; Sotirios A. Tsaftaris; Ben van Calster; Gaël Varoquaux; Paul F. Jäger

doi:10.1038/s41592-023-02151-z

Metrics reloaded: recommendations for image analysis validation

Lena Maier-Hein, Annika Reinke, Patrick Godau, Minu D. Tizabi, Florian Buettner, Evangelia Christodoulou, Ben Glocker, Fabian Isensee, Jens Kleesiek, Michal Kozubek, Mauricio Reyes, Michael A. Riegler, Manuel Wiesenfarth, A. Emre Kavur, Carole H. Sudre, Michael Baumgartner, Matthias Eisenmann, Doreen Heckmann-Nötzel, Tim Rädsch, Laura AcionMichela Antonelli, Tal Arbel, Spyridon Bakas, Arriel Benis, Matthew B. Blaschko, M. Jorge Cardoso, Veronika Cheplygina, Beth A. Cimini, Gary S. Collins, Keyvan Farahani, Luciana Ferrer, Adrian Galdran, Bram van Ginneken, Robert Haase, Daniel A. Hashimoto, Michael M. Hoffman, Merel Huisman, Pierre Jannin, Charles E. Kahn, Dagmar Kainmueller, Bernhard Kainz, Alexandros Karargyris, Alan Karthikesalingam, Florian Kofler, Annette Kopp-Schneider, Anna Kreshuk, Tahsin Kurc, Bennett A. Landman, Geert Litjens, Amin Madani, Klaus Maier-Hein, Anne L. Martel, Peter Mattson, Erik Meijering, Bjoern Menze, Karel G. M. Moons, Henning Müller, Brennan Nichyporuk, Felix Nickel, Jens Petersen, Nasir Rajpoot, Nicola Rieke, Julio Saez-Rodriguez, Clara I. Sánchez, Shravya Shetty, Maarten van Smeden, Ronald M. Summers, Abdel A. Taha, Aleksei Tiulpin, Sotirios A. Tsaftaris, Ben van Calster, Gaël Varoquaux, Paul F. Jäger

Radiology and Nuclear Medicine (VUmc)

Research output: Contribution to journal › Article › Academic › peer-review

6 Citations (Scopus)

Abstract

Increasing evidence shows that flaws in machine learning (ML) algorithm validation are an underestimated global problem. In biomedical image analysis, chosen performance metrics often do not reflect the domain interest, and thus fail to adequately measure scientific progress and hinder translation of ML techniques into practice. To overcome this, we created Metrics Reloaded, a comprehensive framework guiding researchers in the problem-aware selection of metrics. Developed by a large international consortium in a multistage Delphi process, it is based on the novel concept of a problem fingerprint—a structured representation of the given problem that captures all aspects that are relevant for metric selection, from the domain interest to the properties of the target structure(s), dataset and algorithm output. On the basis of the problem fingerprint, users are guided through the process of choosing and applying appropriate validation metrics while being made aware of potential pitfalls. Metrics Reloaded targets image analysis problems that can be interpreted as classification tasks at image, object or pixel level, namely image-level classification, object detection, semantic segmentation and instance segmentation tasks. To improve the user experience, we implemented the framework in the Metrics Reloaded online tool. Following the convergence of ML methodology across application domains, Metrics Reloaded fosters the convergence of validation methodology. Its applicability is demonstrated for various biomedical use cases.

Original language	English
Pages (from-to)	195-212
Number of pages	18
Journal	Nature methods
Volume	21
Issue number	2
DOIs	https://doi.org/10.1038/s41592-023-02151-z
Publication status	Published - 1 Feb 2024

Access to Document

10.1038/s41592-023-02151-z

Cite this

Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M. D., Buettner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., Reyes, M., Riegler, M. A., Wiesenfarth, M., Kavur, A. E., Sudre, C. H., Baumgartner, M., Eisenmann, M., Heckmann-Nötzel, D., Rädsch, T., ... Jäger, P. F. (2024). Metrics reloaded: recommendations for image analysis validation. Nature methods, 21(2), 195-212. https://doi.org/10.1038/s41592-023-02151-z

@article{086a52baebd1480fb457d783b3d2d761,

title = "Metrics reloaded: recommendations for image analysis validation",

abstract = "Increasing evidence shows that flaws in machine learning (ML) algorithm validation are an underestimated global problem. In biomedical image analysis, chosen performance metrics often do not reflect the domain interest, and thus fail to adequately measure scientific progress and hinder translation of ML techniques into practice. To overcome this, we created Metrics Reloaded, a comprehensive framework guiding researchers in the problem-aware selection of metrics. Developed by a large international consortium in a multistage Delphi process, it is based on the novel concept of a problem fingerprint—a structured representation of the given problem that captures all aspects that are relevant for metric selection, from the domain interest to the properties of the target structure(s), dataset and algorithm output. On the basis of the problem fingerprint, users are guided through the process of choosing and applying appropriate validation metrics while being made aware of potential pitfalls. Metrics Reloaded targets image analysis problems that can be interpreted as classification tasks at image, object or pixel level, namely image-level classification, object detection, semantic segmentation and instance segmentation tasks. To improve the user experience, we implemented the framework in the Metrics Reloaded online tool. Following the convergence of ML methodology across application domains, Metrics Reloaded fosters the convergence of validation methodology. Its applicability is demonstrated for various biomedical use cases.",

author = "Lena Maier-Hein and Annika Reinke and Patrick Godau and Tizabi, {Minu D.} and Florian Buettner and Evangelia Christodoulou and Ben Glocker and Fabian Isensee and Jens Kleesiek and Michal Kozubek and Mauricio Reyes and Riegler, {Michael A.} and Manuel Wiesenfarth and Kavur, {A. Emre} and Sudre, {Carole H.} and Michael Baumgartner and Matthias Eisenmann and Doreen Heckmann-N{\"o}tzel and Tim R{\"a}dsch and Laura Acion and Michela Antonelli and Tal Arbel and Spyridon Bakas and Arriel Benis and Blaschko, {Matthew B.} and Cardoso, {M. Jorge} and Veronika Cheplygina and Cimini, {Beth A.} and Collins, {Gary S.} and Keyvan Farahani and Luciana Ferrer and Adrian Galdran and {van Ginneken}, Bram and Robert Haase and Hashimoto, {Daniel A.} and Hoffman, {Michael M.} and Merel Huisman and Pierre Jannin and Kahn, {Charles E.} and Dagmar Kainmueller and Bernhard Kainz and Alexandros Karargyris and Alan Karthikesalingam and Florian Kofler and Annette Kopp-Schneider and Anna Kreshuk and Tahsin Kurc and Landman, {Bennett A.} and Geert Litjens and Amin Madani and Klaus Maier-Hein and Martel, {Anne L.} and Peter Mattson and Erik Meijering and Bjoern Menze and Moons, {Karel G. M.} and Henning M{\"u}ller and Brennan Nichyporuk and Felix Nickel and Jens Petersen and Nasir Rajpoot and Nicola Rieke and Julio Saez-Rodriguez and S{\'a}nchez, {Clara I.} and Shravya Shetty and {van Smeden}, Maarten and Summers, {Ronald M.} and Taha, {Abdel A.} and Aleksei Tiulpin and Tsaftaris, {Sotirios A.} and {van Calster}, Ben and Ga{\"e}l Varoquaux and J{\"a}ger, {Paul F.}",

note = "Publisher Copyright: {\textcopyright} Springer Nature America, Inc. 2024.",

year = "2024",

month = feb,

day = "1",

doi = "10.1038/s41592-023-02151-z",

language = "English",

volume = "21",

pages = "195--212",

journal = "Nature methods",

issn = "1548-7091",

publisher = "Nature Publishing Group",

number = "2",

}

Maier-Hein, L, Reinke, A, Godau, P, Tizabi, MD, Buettner, F, Christodoulou, E, Glocker, B, Isensee, F, Kleesiek, J, Kozubek, M, Reyes, M, Riegler, MA, Wiesenfarth, M, Kavur, AE, Sudre, CH, Baumgartner, M, Eisenmann, M, Heckmann-Nötzel, D, Rädsch, T, Acion, L, Antonelli, M, Arbel, T, Bakas, S, Benis, A, Blaschko, MB, Cardoso, MJ, Cheplygina, V, Cimini, BA, Collins, GS, Farahani, K, Ferrer, L, Galdran, A, van Ginneken, B, Haase, R, Hashimoto, DA, Hoffman, MM, Huisman, M, Jannin, P, Kahn, CE, Kainmueller, D, Kainz, B, Karargyris, A, Karthikesalingam, A, Kofler, F, Kopp-Schneider, A, Kreshuk, A, Kurc, T, Landman, BA, Litjens, G, Madani, A, Maier-Hein, K, Martel, AL, Mattson, P, Meijering, E, Menze, B, Moons, KGM, Müller, H, Nichyporuk, B, Nickel, F, Petersen, J, Rajpoot, N, Rieke, N, Saez-Rodriguez, J, Sánchez, CI, Shetty, S, van Smeden, M, Summers, RM, Taha, AA, Tiulpin, A, Tsaftaris, SA, van Calster, B, Varoquaux, G & Jäger, PF 2024, 'Metrics reloaded: recommendations for image analysis validation', Nature methods, vol. 21, no. 2, pp. 195-212. https://doi.org/10.1038/s41592-023-02151-z

TY - JOUR

T1 - Metrics reloaded

T2 - recommendations for image analysis validation

AU - Maier-Hein, Lena

AU - Reinke, Annika

AU - Godau, Patrick

AU - Tizabi, Minu D.

AU - Buettner, Florian

AU - Christodoulou, Evangelia

AU - Glocker, Ben

AU - Isensee, Fabian

AU - Kleesiek, Jens

AU - Kozubek, Michal

AU - Reyes, Mauricio

AU - Riegler, Michael A.

AU - Wiesenfarth, Manuel

AU - Kavur, A. Emre

AU - Sudre, Carole H.

AU - Baumgartner, Michael

AU - Eisenmann, Matthias

AU - Heckmann-Nötzel, Doreen

AU - Rädsch, Tim

AU - Acion, Laura

AU - Antonelli, Michela

AU - Arbel, Tal

AU - Bakas, Spyridon

AU - Benis, Arriel

AU - Blaschko, Matthew B.

AU - Cardoso, M. Jorge

AU - Cheplygina, Veronika

AU - Cimini, Beth A.

AU - Collins, Gary S.

AU - Farahani, Keyvan

AU - Ferrer, Luciana

AU - Galdran, Adrian

AU - van Ginneken, Bram

AU - Haase, Robert

AU - Hashimoto, Daniel A.

AU - Hoffman, Michael M.

AU - Huisman, Merel

AU - Jannin, Pierre

AU - Kahn, Charles E.

AU - Kainmueller, Dagmar

AU - Kainz, Bernhard

AU - Karargyris, Alexandros

AU - Karthikesalingam, Alan

AU - Kofler, Florian

AU - Kopp-Schneider, Annette

AU - Kreshuk, Anna

AU - Kurc, Tahsin

AU - Landman, Bennett A.

AU - Litjens, Geert

AU - Madani, Amin

AU - Maier-Hein, Klaus

AU - Martel, Anne L.

AU - Mattson, Peter

AU - Meijering, Erik

AU - Menze, Bjoern

AU - Moons, Karel G. M.

AU - Müller, Henning

AU - Nichyporuk, Brennan

AU - Nickel, Felix

AU - Petersen, Jens

AU - Rajpoot, Nasir

AU - Rieke, Nicola

AU - Saez-Rodriguez, Julio

AU - Sánchez, Clara I.

AU - Shetty, Shravya

AU - van Smeden, Maarten

AU - Summers, Ronald M.

AU - Taha, Abdel A.

AU - Tiulpin, Aleksei

AU - Tsaftaris, Sotirios A.

AU - van Calster, Ben

AU - Varoquaux, Gaël

AU - Jäger, Paul F.

PY - 2024/2/1

Y1 - 2024/2/1

N2 - Increasing evidence shows that flaws in machine learning (ML) algorithm validation are an underestimated global problem. In biomedical image analysis, chosen performance metrics often do not reflect the domain interest, and thus fail to adequately measure scientific progress and hinder translation of ML techniques into practice. To overcome this, we created Metrics Reloaded, a comprehensive framework guiding researchers in the problem-aware selection of metrics. Developed by a large international consortium in a multistage Delphi process, it is based on the novel concept of a problem fingerprint—a structured representation of the given problem that captures all aspects that are relevant for metric selection, from the domain interest to the properties of the target structure(s), dataset and algorithm output. On the basis of the problem fingerprint, users are guided through the process of choosing and applying appropriate validation metrics while being made aware of potential pitfalls. Metrics Reloaded targets image analysis problems that can be interpreted as classification tasks at image, object or pixel level, namely image-level classification, object detection, semantic segmentation and instance segmentation tasks. To improve the user experience, we implemented the framework in the Metrics Reloaded online tool. Following the convergence of ML methodology across application domains, Metrics Reloaded fosters the convergence of validation methodology. Its applicability is demonstrated for various biomedical use cases.

AB - Increasing evidence shows that flaws in machine learning (ML) algorithm validation are an underestimated global problem. In biomedical image analysis, chosen performance metrics often do not reflect the domain interest, and thus fail to adequately measure scientific progress and hinder translation of ML techniques into practice. To overcome this, we created Metrics Reloaded, a comprehensive framework guiding researchers in the problem-aware selection of metrics. Developed by a large international consortium in a multistage Delphi process, it is based on the novel concept of a problem fingerprint—a structured representation of the given problem that captures all aspects that are relevant for metric selection, from the domain interest to the properties of the target structure(s), dataset and algorithm output. On the basis of the problem fingerprint, users are guided through the process of choosing and applying appropriate validation metrics while being made aware of potential pitfalls. Metrics Reloaded targets image analysis problems that can be interpreted as classification tasks at image, object or pixel level, namely image-level classification, object detection, semantic segmentation and instance segmentation tasks. To improve the user experience, we implemented the framework in the Metrics Reloaded online tool. Following the convergence of ML methodology across application domains, Metrics Reloaded fosters the convergence of validation methodology. Its applicability is demonstrated for various biomedical use cases.

UR - http://www.scopus.com/inward/record.url?scp=85184862654&partnerID=8YFLogxK

U2 - 10.1038/s41592-023-02151-z

DO - 10.1038/s41592-023-02151-z

M3 - Article

C2 - 38347141

SN - 1548-7091

VL - 21

SP - 195

EP - 212

JO - Nature methods

JF - Nature methods

IS - 2

ER -

Metrics reloaded: recommendations for image analysis validation

Abstract

Access to Document

Other files and links

Cite this