Home >> ALL ISSUES >> 2023 Issues >> Newsbytes

Newsbytes

image_pdfCreate PDF

Editors: Raymond D. Aller, MD, & Dennis Winsten

How observer studies can help labs assess technology solutions

Ocotber 2023—Health care technology companies, by and large, are eager to share product metrics—that is, standalone product performance—with potential pathology lab clients but less eager to share how those technologies may impact laboratory workflow and decision-making.

“In the long run, I don’t really care that much about standalone performance,” asserted Elizabeth Krupinski, PhD, professor and vice chair for research, Department of Radiology and Imaging Sciences, Emory University. Dr. Krupinski shared her insights on using observer studies to evaluate how technologies influence users’ perceptions and practices in a presentation at the Association for Pathology Informatics’ 2023 Pathology Informatics Summit and in an interview with CAP TODAY.

Any change in the technology a laboratorian is using fundamentally changes that person’s perception of the task they are performing, which, in turn, can affect workflow, says Dr. Krupinski, who is an experimental psychologist. That means, for example, the way you look at a glass slide through a microscope versus a slide on a digital display “changes everything,” she says, because the technologies themselves affect your eye-tracking or search patterns when examining the image.

To assess how a particular technology may impact pathologists and the workflow of the laboratory, Dr. Krupinski, who regularly collaborates with the pathology department at Emory, recommends conducting an observer study before making a switch.

An observer study administered in a controlled environment can yield unique insights into the effects of the technology because it is conducted with a uniform set of data and conditions, she explains.

Dr. Krupinski

Dr. Krupinski

An observer study typically should be performed before beta testing a new technology, Dr. Krupinski says. This is because beta testing demonstrates how the technology will be used in the normal flow of operations, but it doesn’t shed as much light on how the technology affects decision-making and pathologists’ perceptions because the cases that pathologists encounter during beta testing have not been pre-vetted, she says.

By contrast, Dr. Krupinski carefully selects cases of varying degrees of difficulty for participants to evaluate during an observer study. She recommends having a panel of three pathologists review selected cases before beginning the study. If the panelists agree on the diagnoses, it helps establish the study’s test set of cases as a standard of truth, she says.

The type and variety of pathology cases selected for the study can impact the results, she notes. If the test set for an observer study of a new artificial intelligence-based decision support tool, for example, includes too many easy cases, the tool being evaluated will not seem impactful because the participants will be able to make the diagnoses just as easily on their own, she says. Selecting extremely difficult cases that are not representative of what pathologists typically encounter can also produce biased results. Therefore, Dr. Krupinski aims for a mix of easy cases (approximately 10 percent), cases that are a medium degree of difficulty (approximately 50 percent), and cases that are more difficult (40 percent).

Dr. Krupinski usually selects about 50 cases for an observer study, and she typically requires that six observers participate to ensure the study benefits from a broad enough range of perspectives. Several academic studies have proposed methodologies for determining the appropriate sample size for observer studies, she adds. For example, an article in the American Journal of Roentgenology includes tables that show how the ratio of cases to observers impacts the accuracy of receiver operating characteristic study results (Obuchowski NA. 2000. doi.org/10.2214/ajr.​175.3.1750603). Another article, in Biochemia Medica, not only provides formulas that can be used to calculate an appropriate observer study sample size but also lists websites that offer calculators for estimating sample size (Serdar CC, et al. 2021. doi.org/10.11613/BM.2021.010502).

Dr. Krupinski favors conducting counterbalanced observer studies, which she breaks into two sessions. When testing an AI tool, for example, half the participants in the first session evaluate pathology images using the tool and the other half evaluate them unaided. In the second session, held approximately three weeks later, the participant groups are brought back to use the evaluation method they did not use in the first session.

It takes only about an hour to complete each session, Dr. Krupinski says, because the responses required for the study are much less extensive than the information that a pathologist would need to provide when signing out a case.

Maintaining uniform conditions for an observer study requires carefully controlling numerous factors that can impact results, Dr. Krupinski says.

For example, studies have shown that the accuracy and speed of decision-making decrease late in the day, when people tend to be fatigued. Therefore, both observer study sessions should be conducted at the same time, preferably earlier in the day.

It is also important to carefully control the physical environment where the study is conducted, including the ambient lighting; quality and type of computer monitor used, particularly in digital pathology; and noise levels in the room. “The key is to keep everything as consistent as possible across observers throughout your study,” Dr. Krupinski says.

Organizers of these studies must also consider how pathology cases will be presented to the observers, Dr. Krupinski says. Will observers see only pathology images, or will they also have access to the associated clinical histories? If they have access to clinical histories, will they see that information before or after they look at the images? “It changes whether they go in with a preset impression,” she adds.

The results of some of the observer studies conducted by Dr. Krupinski have also helped other medical departments at Emory decide whether a certain type of technology would be a good fit. For example, observers in a recent study found the AI technology they were evaluating for their department to be cumbersome because it required too many clicks to obtain useful information.

Results of observer studies of AI tools don’t always match vendors’ claims about the effectiveness of their products, Dr. Krupinski says. AI vendors often suggest that their products can help all physicians make more accurate medical decisions. “What we have found over the years, in a lot of studies, is that is not always the case.”

Instead, AI tools often improve accuracy among residents and less experienced physicians, while the gains in accuracy for more experienced physicians are small. For that reason, Dr. Krupinski typically selects a mix of highly experienced pathologists and novices to serve as observers in studies, thereby allowing the studies to measure the effects of technology on different ability levels.

While AI tools for decision support may not greatly impact an experienced pathologist’s level of accuracy, she says, they often have a more significant effect on the amount of time it takes to make decisions.

“Efficiency sometimes outweighs any gains in efficacy and accuracy because you’ll get less fatigued,” Dr. Krupinski explains. “You’ll be able to read more images in a given period of time, and you won’t have to be doing some of these mundane tasks, like counting nuclei, that can be done by something else, such as AI, far more efficiently.”

This speaks to the importance of identifying multiple goals when evaluating new technology, she continues. If improvements in accuracy are minimal, perhaps there are other metrics that make the technology investment worthwhile.

“I always, always measure how long it takes to interpret the images,” Dr. Krupinski says. “Maybe efficiency is going to be where the return on investment is.”

—Renee Caruthers

CAP TODAY
X