Home >> ALL ISSUES >> 2020 Issues >> Closing the workflow loop: HistoQC for digital slides

Closing the workflow loop: HistoQC for digital slides

image_pdfCreate PDF

Valerie Neff Newitt

July 2020—Unveiled in 2018, HistoQC, an open-source quality control tool for digital pathology slides, was an “awakening to a problem” and the kickoff of a conversation, says Andrew Janowczyk, PhD, its main investigator. And while it’s hard to measure the tool’s use because it’s freely available, he says, reception has been strong. “It’s been a pretty good ride,” he says of its first two years.

Dr. Janowczyk, assistant research professor, Department of Biomedical Engineering, Case Western Reserve University, and colleagues designed HistoQC to make it easier and faster to identify and delineate artifacts and batch effects during routine slide preparation and digitization. “Manual review of glass slides and digital slides is laborious, qualitative, and subject to intra- and inter-reader variability,” they wrote, stressing the need for an automated way to spot slides that need to be remade and regions that should be avoided during computational analysis (Janowczyk A, et al. JCO Clin Cancer Inform. 2019;3. doi:10.1200/CCI.18.​00157).

Their solution, HistoQC, fills “an intuition gap” in artificial intelligence, Dr. Janowczyk tells CAP TODAY.

Pathologists train themselves over years, he says, to read through slides of suboptimal quality, if necessary, “because they’re highly skilled, very well educated, and have learned to overcome these types of hurdles.”

Dr.Janowczyk

“Digital technologies like artificial intelligence and machine learning are unfortunately not currently that skilled. They’re not that robust in the presence of artifacts on slides,” he says. “So something needs to be in place that guarantees that whatever uses a slide next, be it human or machine, knows what it is getting and can feel confident that it is exactly what it thinks it is. In other words, we want to make sure that before a sophisticated, yet relatively ignorant, algorithm is used, there will be a step to make sure that the sample it is to be applied to is appropriate.”

“HistoQC quickly and efficiently does exactly that,” he says.

Dr. Janowczyk, who is also a member of the cancer imaging program at Case Comprehensive Cancer Center, says the user interface was built so that HistoQC can be used on a regular internet browser.

A tab-separated value file in which image metrics are saved can be loaded into the browser-based front-end or any type of statistical toolbox, such as Excel, allowing any type of analysis to be performed, Dr. Janowczyk explains. “You simply double click the file on the hard drive of your computer, it will open up a web browser, and you can see results there immediately. From there, the user can go in and manipulate the user interface, which shows all the metrics and thumbnail images in real time.” Now the user can look at, say, 1,000 slides in 15 to 20 minutes by scrolling through them. “This one’s good. This one’s bad. It becomes a very efficient process.”

For clinical use, he says, as soon as a slide is scanned, it can be run through HistoQC and the quality can be evaluated before the pathologist sees it. “We can determine much sooner in the workflow if a slide is of bad quality. It doesn’t have to go to the pathologist to get rejected,” which means that “the large feedback loop becomes very, very small.” Work has begun at a number of hospitals in this regard, he says. “The limitation is that you need to have a digital clinical workflow in place to take advantage of these types of digital tools.”

Further, Dr. Janowczyk says security is built into the design. “There’s no connection to the internet whatsoever, so you can use it with nonanonymized, confidential patient data. Everything is self-contained.”

A comparison of HistoQC against manual QC by two pathologists on 450 images revealed an average agreement of more than 95 percent.

HistoQC got its start when Dr. Janowczyk had an idea for an unrelated study and turned to the repository of slides available through The Cancer Genome Atlas of the National Cancer Institute. “TCGA has about 30,000 slides, all free and publicly available and accompanied by a lot of data. It’s a very rich resource,” he notes. He found about 600 slides he could use for the study he wanted to do. “I was thrilled because I didn’t have to find new patients. These patients already exist, and they’ve already provided ethics approval.”

When he downloaded the 600 slides and started to look at them, Dr. Janowczyk could see that many were of lower quality and unsuitable for the study. “These slides were never really intended for analysis with computer algorithms. The cohort wasn’t designed for you to compute directly on the slide itself. They built it to the standard for human pathologists, who are robust to quality control problems.”

About 10 percent of the slides were not suitable for computer analysis, “which isn’t bad,” he says. “That’s about the norm.”

“But if it takes just one minute to look at 600 slides, you’ve already lost 10 hours. Right then I realized we need a way to look at slides more efficiently and identify parts of slides that are good and bad quality. Sometimes half of the slide may be bad but the other half acceptable,” he says.

In years past, when data sets were smaller, quality controlling of the images could be done manually, he says. With digital scanners increasingly prevalent, larger repositories are being constructed, resulting in more digital slides at a faster rate than ever before, so “it’s just not feasible anymore.”

“These same large cohorts are where the statistical power for our studies comes from. That’s where we’ll be able to identify small, nuanced biological signal. So we must be well positioned to take advantage of this increasingly massive amount of information.”

CAP TODAY
X