Home >> ALL ISSUES >> 2013 Issues >> It’s here: whole slide imaging validation

It’s here: whole slide imaging validation

image_pdfCreate PDF

Anne Ford

May 2013—For the past four years, a group of pathologists has been diligently considering one question—Exactly how should whole slide imaging be validated?—all the while knowing that some laboratories consider WSI validation an unnecessary undertaking.

“The biggest argument I’ve heard is: ‘Why should we validate these instruments? We don’t validate our microscopes. It seems to be overkill,’” says Alexis B. Carter, MD, a member of the expert panel that created the CAP’s guideline titled “Validating Whole Slide Imaging for Diagnostic Purposes in Pathology,” published online May 1 in the Archives of Pathology & Laboratory Medicine.

Dr. Carter, assistant professor of pathology and laboratory medicine at Emory University School of Medicine in Atlanta, obviously disagrees. So does Liron Pantanowitz, MD, the leader of the panel that wrote the 50-page, 55-footnote document, which represents the first standard guideline regarding validation of WSI for diagnostic use.

“It’s just another instrument that we need to make sure is safe, like any other device,” says Dr. Pantanowitz, associate director of the pathology informatics division in the Department of Pathology at the University of Pittsburgh Medical Center.

Still, he adds, “Not everyone supports the fact that there should be validation.”

So is the working group expecting resistance to the guideline? “I’m anticipating some flak,” Dr. Carter says. “But what I’m hoping this guideline will do is show people the medical evidence behind these recommendations. People who are thinking that whole slide imaging is no different from a microscope aren’t aware of the literature, and hopefully this guideline will help educate them about that.”

The literature to which she refers: 767 international publications, of which the panel considered 27 strong enough to be subjected to data extraction and review by an independent methodologist. Twenty-three of those publications, along with comments from the public and consensus from the expert panel, formed the basis of the guideline. Depending on the strength of the evidence behind it, each item in the guideline has been categorized from strongest to weakest as a “recommendation,” a “suggestion,” or an “expert consensus opinion.”

Dr. Pantanowitz

A summary of the guideline’s findings makes them sound relatively straightforward. “Validation of the entire WSI system, involving pathologists trained to use the system, should be performed in a manner that emulates the laboratory’s actual clinical environment,” the summary reads. “It is recommended that such a validation study include at least 60 routine cases per application, comparing intraobserver diagnostic concordance between digitized and glass slides viewed at least 2 weeks apart. It is important that the validation process confirm that all material present on a glass slide to be scanned is included in the digital image.”

Simple, no? No. Several of those points—the 60 cases, the intraobserver issue, even that two-week period—required a hefty amount of discussion and deliberation during the panel’s 19 meetings. And when draft recommendations were posted on the CAP Web site in July 2011, they drew more than 500 comments from 132 respondents, requiring further modifications.

Take the question of how many routine cases a validation study should include. “There were pathologists not on the panel who called me and said, ‘One case would be good enough,’” Dr. Pantanowitz recalls. “Well, just from a practical point of view, one case wouldn’t be sufficient to make sure this works in a laboratory.” Then, too, Dr. Carter says, “There were a number of people in the group who felt very strongly that the more cases that were used in the validation process, the safer the implementation was going to be.”

“Some people thought we should be doing hundreds of thousands of cases,” Dr. Pantanowitz confirms. “But thousands of cases? We’re trying to make it practical for people to use this. We’re not trying to get this FDA-cleared as a vendor.” (Speaking of the FDA, a quick aside: As of fall 2011, the agency considers WSI systems to be class III medical devices. “There hasn’t really been a final word on that,” Dr. Pantanowitz says.)

As a starting point, the panel suggested 100 cases, as “a number that is practical and easy enough for people to do and still provides some assurance of proper validation,” he says. However, comments on the draft recommendations did not strongly support that number. So the panel examined studies that had used the following average numbers of cases: 20, 60, and 200.

“When we looked at the literature that used an average of 20 cases, the concordance between this digital modality and glass was only 75 percent,” Dr. Pantanowitz says. “That’s not enough. Well, what about 200 cases? That turned out to yield 91 percent concordance. I don’t know why, but when we looked at 60 cases, that yielded the best concordance [95 percent]—I guess because you’re not overburdening people, but you’re still giving them sufficient cases.”

That said, that number applies to only limited applications, such as frozen sections for brain lesions. “If you’re going to use it for more than that, such as cytology or hematology, meaning smears or hematoxylin and eosin, including frozens and permanent sections, you’re going to have to do another 20 cases for each additional application,” Dr. Carter points out.

Dr. Henricks

As for intraobserver concordance, the aim of that suggestion is to “take out the variable of individual pathologist expertise,” explains panel member Walter H. Henricks, MD, medical director of the Center for Pathology Informatics, Cleveland Clinic. “It’s most important for an individual pathologist to make the same diagnosis on the same case whether he or she is using glass versus whole slide imaging. I might make a different call than someone else, but what’s important is that I’m using the same judgment regardless of how I’m looking at it.”

Unfortunately, “when we looked at the literature, no one had studied this,” Dr. Pantanowitz says, “so there was no real data to base this on,” though he adds that 86 percent of the commenters on the draft guideline agreed with the importance of establishing intraobserver concordance. Hence this element of the guideline was categorized as a suggestion rather than a recommendation.

Determining the recommended length of the washout period—that is, the length of time allowed to pass after a pathologist views a case or slide and before he or she reviews it using a different modality—proved tricky as well. Short washout periods can lead to bias, of course, as pathologists tend to remember especially interesting or difficult cases for at least some period of time. But there are problems with long washout periods, too. First, they can prove cumbersome for a laboratory. And second, diagnostic criteria can change over time, either because a particular pathologist becomes more skilled or because new criteria for certain diagnoses emerge.

Then, too, “Many studies don’t even report their washout periods,” Dr. Henricks points out, making it difficult for the panel to establish a recommended length of time. The studies that do report their washout periods tend to use periods of between one and three weeks.

“When we looked at studies with washout periods of less than one week,” Dr. Pantanowitz explains, “their accuracy wasn’t very good—about 70 to 75 percent. When we looked at those studies that waited more than six months, again, a lot of them didn’t include that data, but one of them showed concordance of 95 percent. But if we took just a two- or three-week period, we found that the accuracy was also around 95 percent. So why wait six months when you could achieve the same level of concordance in a two- or three-week period?” The panel originally recommended three weeks, changing it to two in response to comments it received on the draft guideline.

CAP TODAY
X