Home >> ALL ISSUES >> 2020 Issues >> Study: Subpar reporting of biomarker characteristics

Study: Subpar reporting of biomarker characteristics

image_pdfCreate PDF

A contributor to the reproducibility crisis?

Anne Paxton

January 2020—Since the Journal of Irreproducible Results arrived on the scene in 1955, founded by a physicist and a virologist, its parodies of scientific research have amused and skewered many in the scientific community.

But in the world of clinical research, irreproducibility is a less than whimsical idea. Over the past 20 years, the increasing number of clinical studies with findings that are not confirmed upon retesting has created a “massive replication crisis” in clinical medicine and preclinical research, writes Patrick M. Bossuyt, PhD, professor of clinical epidemiology at the University of Amsterdam, in a recent Clinical Chemistry editorial titled “Laboratory measurement’s contribution to the replication and application crisis in clinical research” (2019;65[12]:​1479–​1480).

‘If the journals in their instructions to authors insist that this information be provided, then authors will provide it.’ — David Sacks, MB, ChB

Researchers at drug companies report anecdotally that a relatively small fraction of published findings are reproducible when investigated, says David B. Sacks, MB, ChB, senior investigator and chief of the clinical chemistry service in the Department of Laboratory Medicine at the National Institutes of Health. A few years ago, Dr. Sacks wondered how laboratory issues might factor into the reproducibility problem and decided with colleagues to take a look. That led to a startling discovery.

“Measurement of biomarkers is fundamental to many, many aspects of patient care, ranging from diagnosis of patients to management and evaluation of treatment,” Dr. Sacks says. But “I noticed that many clinical papers didn’t describe how they measured biomarkers really well. And I realized that nobody had ever addressed the lab aspects.” He and colleagues at the NIH, University of Virginia, and Weill Cornell Medicine in New York undertook a study of the laboratory testing aspect of clinical studies involving biomarkers, which often form the basis for clinical guidelines.

When the authors examined 544 studies (and 1,299 biomarker uses) published in the top five clinical medical journals, the level of inadequate reporting took them aback. Their study found that “reporting of the analytical performance of biomarker measurements is variable and often absent from published clinical studies” (Sun Q, et al. Clin Chem. 2019;65​[12]:​1554–1562).

“I was shocked by the magnitude of the omissions we observed,” Dr. Sacks says. For two-thirds of the biomarkers, no information on analytical characteristics was provided, while for a majority of biomarkers the manufacturers could not be identified and there was no information about the trueness or precision of the methods used to measure the biomarker.

Inadequate reporting is a threat to interpretation and replication of study findings, Dr. Sacks and his coauthors write. As Dr. Bossuyt confirms in his editorial, insufficient, uninformative reporting of research methods “renders all attempts at reproducing the original findings hazardous” and is one issue that returns repeatedly in all explanations of the massive replication crisis.

The Sun, et al., study examined articles published between 2006 and 2016 in Annals of Internal Medicine, JAMA, Lancet, New England Journal of Medicine, and PLOS Medicine. If a biomarker was used for selection or classification of participants or as a study outcome, or for any combination of those uses, the study was selected. Articles were excluded if they were not clinical studies in humans, had fewer than 10 participants, used only immunohistochemical or imaging markers, were meta-analyses of primary studies or computer modeling studies with no actual biomarker measurements, or had been retracted.

The most frequent molecular biomarker types were proteins (55 percent), nucleic acids (12 percent), and lipids/steroids (12 percent). The most common biomarkers were C-reactive protein (six percent), cardiac troponin (five percent), and glucose (four percent).

Eleven key analytical characteristics were evaluated: accuracy, day-to-day imprecision, within-run imprecision, otherwise unspecified imprecision, analytical sensitivity, interferences, reportable range of results, reference interval, cutoffs for test positivity or decision limits, quality control, and calibration/calibration verification. The authors assigned a score of zero to 11 for each biomarker based on the number of analytical performance characteristics reported. For 865 of the 1,299 biomarker measurements, the score was zero. The median reporting rate for each of the five journals and for the full data set of analytical characteristics was also zero. Of the analytical characteristics evaluated, the reporting rate of total imprecision was the highest, at 13 percent; that of interference studies was the lowest, at two percent.

This low reporting occurred despite the known fact that for many biomarkers, results for a given patient sample differ depending on the manufacturer of the test. For example, the authors point out, cardiac troponin I results have varied as much as 33-fold for some assays. Other manufacturer-dependent differences are seen with thyroid stimulating hormone, prostate-specific antigen, and human chorionic gonadotropin, as well as long established biomarkers such as albumin and creatinine. Without identification of the manufacturer, the study says, clinicians “cannot know if the decision cutpoints used in the study are appropriate for the test in their hospitals or at other laboratories where their patients’ samples are assayed.”

The kinds of problems a lack of reproducibility can create go far beyond the theoretical, as the Sun, et al., study explains in relation to tight glycemic control. The 2001 van den Berghe trial, a randomized controlled study that concluded that tight glycemic control decreased mortality in critically ill patients, led to a massive shift in clinical guidelines for treatment (N Engl J Med. 2001;345[19]:​1359–1367). But a second trial, reported in 2009 (the multicenter NICE-SUGAR trial), appeared to contradict the results of the van den Berghe trial, finding an increase in mortality with tight glucose control (N Engl J Med. 2009;360[13]:​1283–1297).

As the Sun, et al., study notes, the van den Berghe trial measured glucose by a highly accurate and precise method, described in the publication, while the NICE-SUGAR study included glucose measured by a variety of methods, including point-of-care glucose meters. No manufacturers’ names were provided in the second study, nor was there information about the analytical performance of the various methods. In addition, results for some of the then relatively imprecise glucose methods were falsely high in critically ill patients and for other meters consistently low. Yet, crucially, the same algorithm was used at all centers to determine the insulin infusion rates based on the glucose concentrations. The magnitude of the ensuing risk to patients with falsely high results was not apparent because no data were reported on the quality of the glucose measurements in the study.

In the aftermath of the diverging conclusions of the 2001 and 2009 studies, tight glycemic control has fallen out of favor, but the reasons may not necessarily be justified by the research trials, Dr. Sacks says. “As soon as the NICE-SUGAR study came out, practice changed in many, many academic hospitals, and tight glycemic control was loosened substantially. The whole notion lost a lot of momentum. It’s not clear whether the different centers that were involved in the multicenter study used the same meters, and it’s very well known that the values from different meters vary even in the same institution.” (With HbA1c, Dr. Sacks notes, because so much time and money has been spent on standardization, variations among assays are less of a problem.)

CAP TODAY
X