Newsbytes

in 2022 Issues, In Every Issue, November 2022

Machine-learning algorithms: how to detect and deter bias

November 2022—Bias—a type of prejudice that may go back to the beginning of humankind—has, in recent years, been the focus of attention with regard to developing machine-learning algorithms for clinical laboratory testing.

Clinical laboratory tests can hide data biases that could potentially generate misleading or even inaccurate results, says Brian Jackson, MD, medical director of business development, information technology, and support services, ARUP Laboratories, and associate professor of pathology, University of Utah School of Medicine, Salt Lake City. “When an algorithm gives you an answer to a question, but the algorithm doesn’t display the logic by which it arrives at that answer, the user is left at a big disadvantage in being able to see whether there’s bias going on.”

To compensate for such potential issues, laboratories need to focus heavily on quality control before implementing tests based on machine-learning algorithms, Dr. Jackson says. “I think the validation needs to be even more aggressive than what we do for [other] laboratory tests,” he adds.

Common types of biases that affect machine-learning algorithms include sample bias, exclusion bias, and measurement bias, says Christopher McCudden, PhD, clinical biochemist, Ottawa Hospital, and vice chair, Department of Pathology and Laboratory Medicine, University of Ottawa, Ontario. Dr. McCudden is also deputy chief medical/scientific officer for the Eastern Ontario Regional Laboratory Association.

Dr. Jackson

In cases of sample bias, he says, the data used to create an algorithm are not fundamentally applicable to the population for which they are being used—for example, using data from male patients to create an algorithm for testing women. In exclusion bias, a portion of a population is omitted from the data, effectively creating a data set that doesn’t represent the population as a whole. Measurement bias occurs when an algorithm is applied to a population using equipment or a test method that differs from what was used to create the algorithm.

All of these types of biases have the potential to generate flawed laboratory results that can negatively affect patient care, Dr. McCudden says. Complicating matters further, sample and exclusion biases can be based on many factors, including race, gender, diet, and age.

Biases, Dr. Jackson adds, can be found in all algorithms that are based on heterogeneous data sets because smaller subgroups within a population are, by definition, underrepresented in results. “The machine-learning algorithm is going to be mathematically reflective of the 95 percent, not the five percent,” he explains. “So, basically, the biases that exist in real-world data sets end up being represented in machine-learning algorithms in ways that are completely invisible to the user.”

For laboratories, the first critical step in mitigating machine-learning bias is being able to access the data sets used to develop the tests, Dr. Jackson says. Ideally, researchers should have access to the original data on which algorithms were tested so they can determine whether they can replicate the results.

A University of Pittsburgh School of Medicine study comparing various algorithms for predicting the mortality risk for pneumonia patients illustrated the importance of transparency, Dr. Jackson says. One of the algorithmic models that the researchers studied “learned” a surprising rule suggesting that pneumonia patients with asthma had a lower risk of developing more severe pneumonia. (An independent analysis of the study was published in KDD ’15: Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining [Caruana R, et al. doi.org/ 10.1145/2783258.2788613].)

A closer look at the data revealed why that was the case: “It reflected a true pattern in the training data—patients with a history of asthma who presented with pneumonia were admitted not only to the hospital but directly to the ICU,” according to the article. In other words, pneumonia patients with asthma were being treated more aggressively, which could have contributed to their lower mortality risk.

A challenge for the study researchers was that while neural network algorithms, or neural nets, were the most accurate algorithms for predicting high-risk pneumonia patients, they were not transparent. The flawed asthma rule was discovered using a less accurate but more transparent logical-regression algorithm. Consequently, “although the neural nets were the most accurate models, after careful consideration they were considered too risky for use on real patients and logical regression was used instead,” the article authors explained.

In peer-reviewed studies of algorithms, it is essential to examine the methods sections for details about test specificity, sensitivity, and predictive value, as well as how the algorithm was evaluated, Dr. McCudden says. “How big a test population was used? Who were the people? How accurate was it? Is it generalizable? You want to beat up the algorithm to see whether it would work in your particular population.”

Dr. McCudden

But Dr. McCudden acknowledges that peer-reviewed studies may not contain all the information about an algorithm’s methodology that is necessary to achieve transparency. Algorithm developers sometimes claim that privacy concerns prevent them from disclosing all data about study participants. In addition, the methods sections of peer-reviewed articles often have limited space, so technical information about how data were generated, such as which instruments were used with which specimens, may not be included.

Commercially-developed machine-learning algorithms are even more opaque, Dr. McCudden says. Often the best option, he adds, is to treat the commercial algorithm like an unknown or unproven lab test and conduct proficiency testing with specimens that have already been diagnosed.

“If you know that people have disease X and this algorithm is supposed to detect it, does it? If you know that people do not have the disease, does the algorithm indicate that they don’t have it? Calculate the performance metrics and see whether it meets expectations,” he advises.

Laboratories looking for help in mitigating bias in machine-learning algorithms can find expert resources online, Dr. McCudden says. Google’s responsible AI practices website, for example, outlines the company’s recommended best practices for avoiding artificial intelligence bias. IBM’s AI Fairness 360 is an open-source toolkit of metrics that can be used to check for bias in data sets and machine-learning models, and it provides algorithms for mitigating such biases. In addition, the Alan Turing Institute’s fairness, transparency, and privacy interest group publishes research and other information online that can help identify and mitigate machine-learning bias.

Yet, regulators, including the FDA, have been slow to adopt rules and protocols for addressing machine-learning bias in laboratory tests, Dr. Jackson says. The FDA has held conferences and workshops on the topic, but “it appears that the FDA had been underregulating this area, and I hope they step up,” he adds.

Dr. Jackson chairs the ethics subgroup of the CAP Artificial Intelligence Committee, which is developing a position statement about ethical responsibility related to the use of AI in pathology and lab medicine. The AI committee hopes to submit the statement to CAP leadership for consideration by the end of the year.

“Any health care organization looking into implementing a machine-learning algorithm has a responsibility to ensure that appropriate validation has been done,” Dr. Jackson concludes.

Pages: 1 2