Home >> ALL ISSUES >> 2013 Issues >> It’s here: whole slide imaging validation

It’s here: whole slide imaging validation

image_pdfCreate PDF

As for another potentially controversial question—Should digital and glass slides be evaluated in random or nonrandom order during the validation process?—either option is fine, the expert panel says. The guideline reads: “Our meta-analysis of selected articles showed no marked difference in concordance when comparing glass with digital slides viewed in random versus nonrandom allocation. Therefore, our panel felt that laboratories can decide to evaluate their cases in either random or nonrandom order (as to which is examined first and second) for a validation study.”

The CAP guideline aside, some holdouts remain skeptical of whole slide imaging in general. “There is some controversy about the ability of pathologists to interpret patient cases using digital images instead of microscopes,” says Thomas W. Bauer, MD, PhD, of the Department of Anatomic Pathology, Cleveland Clinic.

Dr. Bauer, who was not a member of the panel that created the CAP guideline, is the lead author of “Validation of whole slide imaging for primary diagnosis in surgical pathology,” a study published last month in the Archives of Pathology & Laboratory Medicine (137[4]:518–524).

Dr. Bauer

In his view, intraobserver variability is the key issue here. “Testing whether this technology [whole slide imaging] works does not have anything to do with competence,” he says. “It has to do with: Can I make a diagnosis just as well with this technology as with a microscope? It should not test if I get the answer right or wrong. What matters is that I get the same answer using both methods.”

To his frustration, “If you look at the literature, there are not many really good studies that document intraobserver variability using microscope slides alone,” he points out. “So in our study, we decided to directly compare intraobserver variability interpreting whole slide images with intraobserver variability interpreting microscope slides.”

The first question Dr. Bauer and his coauthors had to answer was: How many samples should they use so they can be reasonably confident that the two diagnostic methods are equivalent? With the help of an independent statistician who reviewed available literature, they determined the answer to be about 450.

The study used two primary pathologists—one who specialized mainly in orthopedic and gastrointestinal cases and a second general surgical pathologist in a community setting who reviewed a broader spectrum of cases—and a one-year washout period. “After obtaining IRB approval, microscope slides of consecutive cases interpreted by each pathologist were retrieved from the file by an independent case coordinator,” Dr. Bauer explains. “That coordinator generated working copies that would have been identical to what the pathologists saw when they first saw the cases.

“The coordinator then distributed every other case back to the pathologist with the microscope slides, while alternate cases were scanned and distributed as whole slide images.” That way, each pathologist interpreted the exact same cases he or she had interpreted more than a year before, half using a microscope and half using digital imaging. “The idea was the pathologists would have exactly the same amount of information available as they did the first time.”

After the pathologists had recorded their diagnoses, other pathologists reviewed those diagnoses and marked them “concordant” or “possibly discordant.” “These were independent pathologists who are subspecialty experts in each individual area,” Dr. Bauer says. “So if there was a possible discrepancy in, say, a liver biopsy, then the pathologist in charge of the liver section would review not only the diagnoses but also the microscope slides. If there was a discrepancy, that referee pathologist would decide which diagnosis was actually better. This was necessary, because it was possible that the diagnosis made by reading the digital image could be better than the diagnosis made by reading the microscope slides. If so, that discrepancy should not count against digital imaging.”

In the end, the major discrepancy rate was determined to be about 1.6 percent for whole slide imaging and about one percent for microscope slides. “Those rates are not statistically different,” Dr. Bauer says. Or, as the study puts it: “. . . diagnostic review by WSI was not inferior to microscope slide review.”

“The major discrepancy rate for both diagnostic methods was favorable when compared to the literature,” Dr. Bauer says.

In addition to the study’s main conclusion, Dr. Bauer and his coauthors uncovered several other interesting findings related to whole slide imaging. First, “We learned that the color enhancement on the screen is not necessarily exactly what you see through the microscope,” he says. “It takes a little bit of practice to adjust to it, but with a little experience it is not a problem.”

Second, they found very few discrepancies with respect to benign versus malignant tumors. Instead, “The cases we had the most difficulty with on digital imaging were some of the subtle inflammatory lesions,” he says. “We learned that in certain cases where the pathologist knows he or she needs to look for individual inflammatory cells at high magnification, it’s a good idea to get a high magnification scan to begin with. The default scan magnification for the study was 20×. We learned that for some types of diagnoses, a 40× scan is better than a 20×.”

Because many of the cases in the study were not especially thorny ones, Dr. Bauer and his coauthors are in the midst of conducting a followup study that will apply the same methods to more difficult examples. “The results of that study, based on only cases we receive for consultation, are very good so far,” he reports.

Dr. Carter

He feels especially confident in the first study’s conclusion given that he and his coauthors used an additional method he has not encountered in other studies. “Many pathology cases are complicated—they consist of multiple parts,” he explains. “So, for example, we might get a prostate biopsy with six different needle specimens that were all taken at the same time. For a study like this, you might count each of those biopsies as completely independent, yielding n = 6. Or you might count it as n = 1, since the surgeon just makes one decision based on the outcome of the entire case. So we evaluated our outcome measures from both perspectives. We calculated our discrepancy rates as if you considered each part independently, and we also calculated them based on cases.”

By way of illustration, he suggests imagining a prostate biopsy with only one needle specimen. If that specimen were to be interpreted as Gleason score six by one method and Gleason score seven by another method, that’s a major discrepancy. But if a prostate biopsy were to have six needle specimens, three of which showed similar Gleason score discrepancies, “that would only count as one major discrepancy for the whole case, not three,” Dr. Bauer says. “We are not aware of previous studies addressing that kind of complexity, but we were trying to be as conservative as possible. Fortunately, the number of discrepancies was low, no matter how we calculated it.”

All well and good. But, returning to the CAP guideline, what if a laboratory remains unconvinced that validation is necessary for whole slide imaging? Dr. Carter offers an analogy.

“Just like freezing tissue can introduce artifacts that have to be accounted for when making a diagnosis, creating a digital image from stained tissue sections on a slide can also introduce subtle yet important artifacts. Sometimes these artifacts can make a diagnosis easier, but they can also make it harder,” she says. “Unlike with frozen sections, our training, knowledge, and competency in recognizing and accounting for these artifacts are in their infancy.” This gap in knowledge will have to be addressed, she says, if the technology is to be used successfully in patient care.

Anne Ford is a writer in Evanston, Ill.

CAP TODAY
X