Project Details
Description
Who are we?
The Biomarker and Test Evaluation Program is an ongoing research program in the Amsterdam University Medical Centers, the University Hospital of the University of Amsterdam, the Netherlands.
What is our mission?
The BiTE program wants to appraise and develop methods for evaluating medical tests and biomarkers, and to apply these methods in relevant clinical studies. In doing so, we wish to strengthen the evidence-base for rational decision-making about the use of tests and test strategies in health care.
What do we want to achieve?
The BiTE program wants to become of one of the leading scientific groups in this field. The BiTE program hopes that this position will become evident from the number of frequently cited papers in high impact journals, citations in guidance papers for researchers and decision-makers, from invitations to contribute to scientific conferences and to the work of organizations in this area, and from other contributions to society.
What is Medical Test Evaluation?
Modern day medicine and health care cannot operate without the use of medical tests and markers. These are procedures and techniques for acquiring additional information about a patient’s present condition or the likely course of that condition in the future. These tests and markers are used to make a diagnosis, to identify the likely cause of the patient’s complaints, to stage disease, to establish a prognosis, to select therapy, to evaluate the effects of therapy, to monitor for side-effects and adjust dosing if necessary, for surveillance after therapy, and for many other reasons.
Scientific progress in biomedical sciences has improved our understanding of the origins of disease and factors responsible for its development. In translational medicine, findings from basic research lead to the identification of putative new biomarkers that could be of help in the management of patients and healthy individuals. A growing awareness of the scarcity of health care resources has led to an increased scrutiny of existing tests and markers and of the indications of their use.
The development of new laboratory tests, imaging modalities, genetic markers or other medical tests goes through several stages, in which new discoveries are carefully evaluated. In current evaluations, the focus lies predominantly on technical features and on analytical issues. The justification for the use of medical tests lies in their relevance for practice: biomarkers and other tests should improve the health of patients, relative to no testing or using other forms of testing, or they should lead to improvements in health care efficiency without compromising health outcome. In the end, the main dimension in clinical and health policy decisions is the clinical utility of medical tests.
What have we achieved so far?
Existing methods from (clinical) epidemiology for the evaluation of tests and markers were not very well developed in the 1980s.1 The dominant study design was and still is the diagnostic accuracy study, a type of research that evaluates how well the results of a test correspond to those of the clinical reference standard, the best available method for establishing the presence of a particular disease – or condition – in patients.2 Relevant but incomplete studies had been done about the sources of bias in these designs, and how the results of such studies could be used for medical decision making. There was limited empirical research, little work on how to synthesize the results of studies, and almost no methodology for evaluating tests used for purposes other than making a diagnosis. Our group has evaluated and extended the existing methods for test evaluations, starting from diagnostic tests. Below we summarize the main results of our findings.
Sources of bias and variability in diagnostic accuracy studies
In the 1980s and 1990s, it was quite common to read that diagnostic accuracy was a fixed property of a test, and that a single and simple diagnostic accuracy, in which the results of the test were compared with the outcome of the reference standard, was all that was needed to obtain accuracy estimates. Even now, this idea is still very much prevalent in epidemiology textbooks.
In our research, we soon discovered that this thesis was not at all tenable. Diagnostic accuracy is far from fixed. Genuine sources of its variability can be gender, age and other patient characteristics, as well as setting selection of patients based on previous testing. We systematically explored sources of variability as well as sources of bias. Our most influential paper is one in which we reported on meta-regression across a number of systematic reviews of test accuracy studies.3 Each review in this analysis had studies with and without shortcomings, and we estimated the magnitude of the average bias of these methodological limitations across the series of systematic reviews. The largest source of bias was the use of healthy controls and the application of multiple reference standards to verify the results of the index tests. The findings of this study were later replicated by our own group, using more refined methodology and a larger database.4 In later studies, we have examined alternative, more efficient ways of staging accuracy studies.5
Reporting of diagnostic accuracy studies
In our work, we also discovered that methodological features were often not or incompletely reported in diagnostic accuracy studies, making it difficult for reviewers and readers to make an appraisal of the validity of the study. Building on the very successful CONSORT initiative to improve the reporting of randomized clinical trials, we started an international project to develop recommendations for complete and transparent reporting of diagnostic accuracy studies.6 We spearheaded this STARD initiative to develop standards for the reporting of such studies. The STARD statement that came out of this initiative was initially published in the January issue of 2003 of more dozen journals. It was updated in 2015 and another update is in preparation.
Quality Appraisal of Test Accuracy Studies
Our work, complemented by additional reviews of the methodological literature, led to the QUADAS initiative, authored by one of our PhD students. QUADAS is a generic tool, specifically developed to appraise the methodological quality of test accuracy studies. This instrument is now undergoing revision by an international group, and members of our team act in the steering group for QUADAS 2.0.
Systematic Reviews of Test Accuracy Studies
Systematic reviews and meta-analysis of clinical studies can be used to obtain more precise estimates when several small studies addressing the same test in the same setting are available. Such reviews can also be useful to establish whether and how accuracy might vary across particular subgroups, and may provide summary estimates with a stronger generalizability than estimates from a single study.
Methods for systematic review and meta-analysis for randomized clinical trials had been available for some time, but similar methods for test accuracy studies were not immediately available. With our colleagues in Birmingham, Providence, Sydney and other places, we develop methods for each of the stages of the systematic review, including comprehensive searches, quality appraisal, evaluating bias in reviews, and meta-analysis.8 The bivariate normal model for meta-analysis, developed by Reitsma, Zwinderman and other colleagues, is now – with the hierarchical summary ROC method – one of the de facto standards for meta-analysis of test accuracy studies.9
The results of our research have been and are being used in the development of a manual for systematic reviews of test accuracy studies to be used in reviews for the Cochrane Collaboration, the largest and most influential international organization preparing, maintaining, and promoting systematic reviews to help people make well-informed decisions about health care.
Other forms of test evaluation
Although diagnostic accuracy studies have been the dominant theme in our research so far, we have also studies – and published on – other forms of test evaluation. These include the validity of randomized clinical trials, methods for developing monitoring schemes, participation in population screening, and patient outcomes in test evaluation.
Who do we collaborate with?
In 2008-2023, we have published and shared grants with the following international colleagues :
• Screening and Test Evaluation Program, Department of Public Health, University of Sydney, Australia
• Unit of Public Health, Epidemiology and Biostatistics, University of Birmingham, United Kingdom
• Fred Hutchinson Cancer Research Center, Seattle, WA, USA
• Health Sciences Center, McMaster University, Hamilton, Ontario, Canada
• Harvard Medical School, Boston, MA, USA
• Durham Veterans Affairs Medical Center and Duke University, Durham, USA
• Center for Statistical Sciences, Brown University, Providence, RI, USA
We share ongoing competitive grants with researchers in the following international partner institutions:
• Screening and Test Evaluation Program, Department of Public Health, University of Sydney, Australia
• World Health Organization, Geneva, Switzerland
• University Of Dundee, United Kingdom
• Nasjonalt Kunnskapssenter For Helsetjenesten, Norway
• Fundacio Privada Institut De Recerca De L'hospital De La Santa CreuI, Sant Pau, Spain
• Associazione Per La Ricerca Sulla Efficacia Della Assistenza Sanitaria, Centro Cochrane Italiano, Italy
• Universitaetsklinikum Freiburg, Germany
• National Institute For Health And Clinical Excellence, United Kingdom
• NHS Quality Improvement Scotland Sign, United Kingdom
• Kustannus Oy Duodecim , Finland
The Biomarker and Test Evaluation Program is an ongoing research program in the Amsterdam University Medical Centers, the University Hospital of the University of Amsterdam, the Netherlands.
What is our mission?
The BiTE program wants to appraise and develop methods for evaluating medical tests and biomarkers, and to apply these methods in relevant clinical studies. In doing so, we wish to strengthen the evidence-base for rational decision-making about the use of tests and test strategies in health care.
What do we want to achieve?
The BiTE program wants to become of one of the leading scientific groups in this field. The BiTE program hopes that this position will become evident from the number of frequently cited papers in high impact journals, citations in guidance papers for researchers and decision-makers, from invitations to contribute to scientific conferences and to the work of organizations in this area, and from other contributions to society.
What is Medical Test Evaluation?
Modern day medicine and health care cannot operate without the use of medical tests and markers. These are procedures and techniques for acquiring additional information about a patient’s present condition or the likely course of that condition in the future. These tests and markers are used to make a diagnosis, to identify the likely cause of the patient’s complaints, to stage disease, to establish a prognosis, to select therapy, to evaluate the effects of therapy, to monitor for side-effects and adjust dosing if necessary, for surveillance after therapy, and for many other reasons.
Scientific progress in biomedical sciences has improved our understanding of the origins of disease and factors responsible for its development. In translational medicine, findings from basic research lead to the identification of putative new biomarkers that could be of help in the management of patients and healthy individuals. A growing awareness of the scarcity of health care resources has led to an increased scrutiny of existing tests and markers and of the indications of their use.
The development of new laboratory tests, imaging modalities, genetic markers or other medical tests goes through several stages, in which new discoveries are carefully evaluated. In current evaluations, the focus lies predominantly on technical features and on analytical issues. The justification for the use of medical tests lies in their relevance for practice: biomarkers and other tests should improve the health of patients, relative to no testing or using other forms of testing, or they should lead to improvements in health care efficiency without compromising health outcome. In the end, the main dimension in clinical and health policy decisions is the clinical utility of medical tests.
What have we achieved so far?
Existing methods from (clinical) epidemiology for the evaluation of tests and markers were not very well developed in the 1980s.1 The dominant study design was and still is the diagnostic accuracy study, a type of research that evaluates how well the results of a test correspond to those of the clinical reference standard, the best available method for establishing the presence of a particular disease – or condition – in patients.2 Relevant but incomplete studies had been done about the sources of bias in these designs, and how the results of such studies could be used for medical decision making. There was limited empirical research, little work on how to synthesize the results of studies, and almost no methodology for evaluating tests used for purposes other than making a diagnosis. Our group has evaluated and extended the existing methods for test evaluations, starting from diagnostic tests. Below we summarize the main results of our findings.
Sources of bias and variability in diagnostic accuracy studies
In the 1980s and 1990s, it was quite common to read that diagnostic accuracy was a fixed property of a test, and that a single and simple diagnostic accuracy, in which the results of the test were compared with the outcome of the reference standard, was all that was needed to obtain accuracy estimates. Even now, this idea is still very much prevalent in epidemiology textbooks.
In our research, we soon discovered that this thesis was not at all tenable. Diagnostic accuracy is far from fixed. Genuine sources of its variability can be gender, age and other patient characteristics, as well as setting selection of patients based on previous testing. We systematically explored sources of variability as well as sources of bias. Our most influential paper is one in which we reported on meta-regression across a number of systematic reviews of test accuracy studies.3 Each review in this analysis had studies with and without shortcomings, and we estimated the magnitude of the average bias of these methodological limitations across the series of systematic reviews. The largest source of bias was the use of healthy controls and the application of multiple reference standards to verify the results of the index tests. The findings of this study were later replicated by our own group, using more refined methodology and a larger database.4 In later studies, we have examined alternative, more efficient ways of staging accuracy studies.5
Reporting of diagnostic accuracy studies
In our work, we also discovered that methodological features were often not or incompletely reported in diagnostic accuracy studies, making it difficult for reviewers and readers to make an appraisal of the validity of the study. Building on the very successful CONSORT initiative to improve the reporting of randomized clinical trials, we started an international project to develop recommendations for complete and transparent reporting of diagnostic accuracy studies.6 We spearheaded this STARD initiative to develop standards for the reporting of such studies. The STARD statement that came out of this initiative was initially published in the January issue of 2003 of more dozen journals. It was updated in 2015 and another update is in preparation.
Quality Appraisal of Test Accuracy Studies
Our work, complemented by additional reviews of the methodological literature, led to the QUADAS initiative, authored by one of our PhD students. QUADAS is a generic tool, specifically developed to appraise the methodological quality of test accuracy studies. This instrument is now undergoing revision by an international group, and members of our team act in the steering group for QUADAS 2.0.
Systematic Reviews of Test Accuracy Studies
Systematic reviews and meta-analysis of clinical studies can be used to obtain more precise estimates when several small studies addressing the same test in the same setting are available. Such reviews can also be useful to establish whether and how accuracy might vary across particular subgroups, and may provide summary estimates with a stronger generalizability than estimates from a single study.
Methods for systematic review and meta-analysis for randomized clinical trials had been available for some time, but similar methods for test accuracy studies were not immediately available. With our colleagues in Birmingham, Providence, Sydney and other places, we develop methods for each of the stages of the systematic review, including comprehensive searches, quality appraisal, evaluating bias in reviews, and meta-analysis.8 The bivariate normal model for meta-analysis, developed by Reitsma, Zwinderman and other colleagues, is now – with the hierarchical summary ROC method – one of the de facto standards for meta-analysis of test accuracy studies.9
The results of our research have been and are being used in the development of a manual for systematic reviews of test accuracy studies to be used in reviews for the Cochrane Collaboration, the largest and most influential international organization preparing, maintaining, and promoting systematic reviews to help people make well-informed decisions about health care.
Other forms of test evaluation
Although diagnostic accuracy studies have been the dominant theme in our research so far, we have also studies – and published on – other forms of test evaluation. These include the validity of randomized clinical trials, methods for developing monitoring schemes, participation in population screening, and patient outcomes in test evaluation.
Who do we collaborate with?
In 2008-2023, we have published and shared grants with the following international colleagues :
• Screening and Test Evaluation Program, Department of Public Health, University of Sydney, Australia
• Unit of Public Health, Epidemiology and Biostatistics, University of Birmingham, United Kingdom
• Fred Hutchinson Cancer Research Center, Seattle, WA, USA
• Health Sciences Center, McMaster University, Hamilton, Ontario, Canada
• Harvard Medical School, Boston, MA, USA
• Durham Veterans Affairs Medical Center and Duke University, Durham, USA
• Center for Statistical Sciences, Brown University, Providence, RI, USA
We share ongoing competitive grants with researchers in the following international partner institutions:
• Screening and Test Evaluation Program, Department of Public Health, University of Sydney, Australia
• World Health Organization, Geneva, Switzerland
• University Of Dundee, United Kingdom
• Nasjonalt Kunnskapssenter For Helsetjenesten, Norway
• Fundacio Privada Institut De Recerca De L'hospital De La Santa CreuI, Sant Pau, Spain
• Associazione Per La Ricerca Sulla Efficacia Della Assistenza Sanitaria, Centro Cochrane Italiano, Italy
• Universitaetsklinikum Freiburg, Germany
• National Institute For Health And Clinical Excellence, United Kingdom
• NHS Quality Improvement Scotland Sign, United Kingdom
• Kustannus Oy Duodecim , Finland
Status | Active |
---|---|
Effective start/end date | 1/01/2006 → … |