Anne Paxton
April 2019—With the release in January of a new guideline for quantitative image analysis of HER2 immunohistochemistry for breast cancer, the CAP believes it is filling a gap and blazing a trail for the profession. In setting evidence-based standards, the guideline provides background and details about the quantitative image analysis (QIA) process and the data and metadata it generates. The guideline will help facilitate pathology’s increasing use of not only digital pathology but also artificial intelligence, says Marilyn Bui, MD, PhD, chair of the CAP expert panel for QIA of HER2 IHC. “This is not just another guideline. It is a milestone for pathologists.”
The CAP and the American Society of Clinical Oncology have issued clinical practice guidelines for testing, interpreting, and reporting of HER2, but those guidelines do not address QIA standards. The new QIA guideline picks up where those guidelines left off. When used for HER2 IHC, QIA’s purpose is to detect and quantify HER2 membranous IHC staining of invasive breast cancer cells, and to provide an accurate, precise, and reproducible quantitative HER2 result. After conducting a systematic review of the literature and choosing eight published studies for its evidentiary base, the QIA expert panel produced seven recommendations and four expert consensus opinions for improving accuracy, precision, and reproducibility of HER2 IHC results for breast cancer (Bui MM, et al. Arch Pathol Lab Med. Published online ahead of print Jan. 15, 2019).
The panel’s strongest recommendations: Laboratories should require validation of their QIA systems and procedures before implementation and must ensure regular maintenance and evaluation of quality control and quality assurance. Pathologists with expertise in HER2 QIA should supervise performance, interpretation, and reporting, the guideline recommends.
The expert panel chose a breast cancer biomarker for this initial QIA guideline because breast cancer is a common cancer with a broad impact and every breast cancer patient needs to be tested for HER2 and ER/PR, says Dr. Bui, senior member of pathology and president of the medical staff, Moffitt Cancer Center, Tampa, Fla., and president of the Digital Pathology Association. “We decided on HER2 because it is a more difficult stain to read than nuclear stains. We narrowed our focus to IHC only for now, not FISH or ISH yet.”
The need to validate the QIA system, algorithm, and procedure was a key concern of the expert panel. “The Food and Drug Administration only regulates the manufacturer to ensure that the product is safe and does what it claims to do,” Dr. Bui says. “We need to evaluate it in our laboratory with real patient samples used in similar practice sessions and continue to watch the system to make sure it is doing the right thing.”
In writing the guideline, the panel had two objectives, says panel member John E. Tomaszewski, MD, chair of pathology and anatomical sciences at the University of Buffalo Jacobs School of Medicine. “We reviewed the literature of image analytics applied to pathology data, especially in reference to HER2, and we then grappled with the different aspects of those image analytics vis á vis what is considered ‘ground truth’”—that is, empirical evidence or information provided by direct observation as opposed to information provided by inference.
Developing the recommendations was difficult, he says. But it won’t be the last word by any means. Continuing work on QIA standards will be needed. “It wasn’t one protocol that fits all. We did a very small slice of what will have to happen in the future. And that will not happen easily because there are way too many pieces of technology in this pipeline.”
[dropcap]S[/dropcap]tudies have shown, Dr. Bui says, that digital image analysis correlates better with pinpointing the correct molecular subtype of tumor than manual biomarker assessments in breast cancer, including HER2. Nevertheless, “Quantitative image analysis has not yet gained widespread acceptance,” she says. A 2016 CAP survey revealed that 22.15 percent of 826 laboratories enrolled in the CAP Histology Quality Improvement Survey were using QIA tools. “The majority were still doing manual reading.”

All members of the QIA expert panel strongly believed in the merit of HER2 QIA, Dr. Bui reports. “But the general sense from pathologists in the field varies. Some may feel that if you use QIA, that means a new set of guidelines to follow as well as a new set of checklists.” With the guideline, “We’re saying if you are practicing QIA, and believe these are the best ways, we hope you will find the guidelines are evidence-based and practical.”
The most important guideline among the 11 statements, in her view, is No. 10: The pathologist who oversees the entire HER2 QIA process used for clinical practice should have the appropriate expertise in this area. For laboratories already conducting the types of QIA in the guideline, she says, this recommendation should pose no additional burden.
“Pathologists are in the front and center of precision medicine. We are practicing pathology in a historical moment. Some call it the ‘third revolution’ in pathology. The first was when IHC was introduced; the second was when molecular diagnostics was incorporated into pathology. The third is whole slide imaging, which gives rise to computational pathology and artificial intelligence.”
Recent publications have shown that artificial intelligence has great potential to assist pathologists in providing better care, Dr. Bui notes. “The federal government reported in 2016 that a top artificial intelligence system has an error rate of 7.5 percent while a top pathologist has an error rate of 3.5 percent in identifying metastatic breast cancer in lymph nodes. But when you combine AI with the pathologist’s skills in practice, you get an error rate of only 0.5 percent. That’s one reason why the government is interested in ‘deep learning’ in medicine,” she adds (referring to a method of machine learning based on learning data representations, as opposed to task-specific algorithms).
Some pathologists have greeted this revolution with wariness or resistance. “Some refuse to accept it. They say, ‘It cannot do better than I can; I don’t trust it.’ But when the evidence is proving that the machine can do better in certain tasks, then many get scared they will be replaced.” Similar reactions were seen in 1848, she says, when microscopes were a disruptive technology and there were warnings that it could never become employed in ordinary practice. “For people who are hesitating to adopt digital pathology, that puts things in perspective. Digital pathology and artificial intelligence are here to stay and will continuously transform the delivery of precision medicine.” But pathologists should not worry that AI is going to replace them anytime soon, she says, urging pathologists “to prepare and find a way to control the direction in which this is going so it will come out better for patients and the profession.”
“With conventional pathology, we have a microscope and we read glass slides. We are limited by location and time, and if you want to share, the person can only view it through another head connected to the scope,” Dr. Bui notes. “The data is analog, which is difficult to share, analyze, compare, search, retrieve, and integrate. With whole slide imaging, the analog data is transformed to digital data, which can be shared without limitation of space or time. It is permanent, searchable, manageable, and integrable.” Most important now, she says, is that this digital information can be analyzed by algorithms or more sophisticated artificial intelligence such as deep learning.
Dr. Bui says artificial intelligence can be a formidable ally to pathology. “It can detect and find things, including rare events that are tedious for pathologists to look for like acid-fast bacilli to diagnose tuberculosis. It can quantify. If you want to count stains, infiltrating lymphocytes, or percentage of tumor necrosis, we believe the AI will do better. And classification is the third thing. It can decide, for example, is this a tumor or not? Is it a low-grade or high-grade tumor?” There are also algorithms to improve workflow and efficiency, and published data showing the prognostic and predictive value of AI. “It lets us improve quality and efficiency with new ways of looking at pathology data to make clinically actionable knowledge and present it to clinicians to help with decision-making,” Dr. Bui says.
Given all the pathology information available, “we feel right now we have only scratched the surface of the information and data from the samples we study. With immunohistochemistry we can now get into the cellular level, with molecular information we get to the genome level, and at the digital pathology level we will be able to combine all this together with clinical, radiological, and longitudinal information on prognosis to predict what will happen—that’s the power of digital pathology in the era of precision medicine,” Dr. Bui says.
Although it took about three years to develop the QIA guideline, it is only the tip of the iceberg, Dr. Tomaszewski agrees. He believes the field must prepare for a complete change in how anatomic pathologists conduct diagnostics. “It will be based on digital pathology and computational pathology skills. The digital pathology part is how you get an image, how you do a scan, the quality of the scan, basically what is the information in the pixels.”
“The ability to get a high-resolution image and know where a pixel is and its resolution as an image has drastically changed over the past 15 years,” he says. “The cameras and sensors are way better than they were.” Similarly, the speed of computer processing has soared. “Maybe eight years ago, if I took an algorithm we had and ran it on a case it might take weeks; it took a whole Linux cluster to run the thing. Now we can do it in seconds on a desktop with a GPU chip in it.”
The third and biggest advance is the algorithmic approach to computing on an image, Dr. Tomaszewski says. “The computational pathology world is how you use an algorithm on those pixels. With the whole artificial intelligence explosion, computational pathology is getting real big, real quick.”
Because AI involves an algorithm that learns, it is an algorithm that constantly changes, he explains. “In a laboratory view of the world, we will have to grapple with that probabilistic aspect of data, as opposed to a deterministic approach to data. As laboratories, we don’t know how to deal with that probabilistic approach to data in application, and our clinical colleagues certainly don’t understand it.”
In this decade, there has been a surge in use of neural networking (information processing modeled after the human brain and the way it learns) and deep learning, which can delve deeper into data to develop predictions. These testing modalities allow image analytics devices to handle massive amounts of data and learn on their own. To address the implications of deep learning, “We have to come together with a quality management system or systems that meet the needs of safety, and we don’t know how to do it yet,” Dr. Tomaszewski says. Down the line, “We’ll need to have a very robust discussion. This guideline is a very targeted preamble of a much bigger issue.”
“Stay tuned” is his message, he says—not “Be afraid.” People should pay more attention to this new technology, but they should not fear it because it will allow pathologists to capitalize on their strengths, he says. “Pathologists think in very systematic ways because we have to. We need to be early adopters, because we’re better positioned than anybody else in medicine to do this leap into the new machine learning environment with all our data.”
[dropcap]A[/dropcap]s the only non-pathologist on the QIA expert panel, Anant Madabhushi, PhD, director of the Center for Computational Imaging and Personalized Diagnostics at Case Western Reserve University, saw part of his role as clarifying the algorithms behind computational imaging. “For a lot of pathologists, it hasn’t been immediately clear how these algorithms work. Many people think of AI as one box or one algorithm, but it is not. So in the discussions we needed to understand the spectrum of algorithms.” That included understanding the two main types of machine learning: “supervised” algorithms—where the output values are known and the algorithm discovers how the input data lead to those values—and “unsupervised” algorithms, which are left to their own devices to discover useful patterns within unlabeled data.

“What was refreshing to me is that through this process of making the guideline we made a meticulous effort to understand the variety of algorithms,” Dr. Madabhushi says. But narrowing the scope of the guideline was also important. “The plans for setting standards for image analysis were originally more grand. But we realized we couldn’t boil the ocean. There were too many ways and directions we could go. So ultimately we decided to just look at quantification.”
For him, one of the guideline’s most important recommendations is No. 5: Laboratories should monitor and document the performance of their QIA system. “The commonality among many algorithms is that they need to be calibrated or validated from time to time,” Dr. Madabhushi says. “No matter how good your algorithm is, if there are preanalytic or other factors that affect parameters like the appearance and color of the image, the parameters start to ‘drift,’ and it is going to affect the image analysis algorithm.”
For example, “Say you have an approach that is focused on quantifying HER2 and requires a number to do that. If the value of a stain is above some threshold, then you identify it as HER2-positive; otherwise it’s negative. The problem is if your scanner or other preanalytic variables suffer from a drift effect, they can have dramatic effects on your results.”
“That was the basis for the guideline’s emphasis on continued assessment of systems: to ensure, fairly rigorously, that there is a system in place to assess the drift of the algorithm for consistency and usability. We stressed the need to go back and revisit and recalculate and reassess whether the approach is providing consistent, useful results,” he says.
The capability to conduct QIA potentially allows for improving efficiency and workflow, removing bottlenecks, and empowering pathologists to provide more precise diagnoses that will help manage treatment, and the guideline will help achieve that, in his view. But the broader hope of the expert panel is that the guideline will serve as a template for guidelines put forward for other quantifications as well, as AI makes its way into other areas—for example, the mitotic count as an indicator of tumor cell proliferation. “There is a huge amount of variability there because normally mitotic count is done manually. We have a number of different folks working on algorithms,” he says, but “there are grand challenges” left to meet in the quantification of mitosis.
The expert panel did not opt to make the guideline on AI because it would have been too large an undertaking at this point, Dr. Madabhushi explains. But he believes the need for continued assessment of the system cannot be overstated. “It’s not just about the image analysis factors. In my mind, continuously monitoring and calibrating the system is a core message for AI in general.”
A number of published papers have reported the troubling finding that AI is not necessarily translating from site to site, he says. “Our own group has a lot of examples where we have found if we train the AI on data from one site, it didn’t work on data from another site. So carefully evaluated consistency and reproducibility of results is a critical consideration, not just in the context of guidelines but in the bigger picture in pathology.” The QIA expert panel hopes the guideline will contribute to greater consistency in identifying HER2 as positive or negative. “We know there is still a huge amount of variability from manual reading, and even with the image analysis algorithms there tend to be variations.”
Because of the potential sensitivity of these algorithms to the data they are trained on, he says, changes in a scanner or other preanalytics may keep the algorithms from performing correctly, which means recalibration may require retraining of the neural network. For that reason, it is crucial to constantly evaluate the output of the QIA algorithm. “Don’t assume those results that the IA algorithm produces are going to consistently perform exactly the way you envisioned.”
As deep learning algorithms for QIA of digital pathology images evolve, they might lead to further recommendations, the guideline notes. “I can envision in the future, if deep learning algorithms become more prevalent, we may have to revisit the standards and think about how to evaluate the consistency and reproducibility of these QIA approaches,” Dr. Madabhushi says.
[dropcap]F[/dropcap]urther QIA standards development is already on the calendar. The QIA guideline is slated for review every four years or more often if substantive and high-quality evidence is published that could potentially alter the recommendations. Quality assurance in whole slide images will be the next target of the CAP’s image analysis standard-setting, Dr. Tomaszewski says. That includes scanning platforms and the quality of data that come out of scanning, plus the human perception of images based off those data.

All of those need separate sets of guidelines, he says. “The algorithms themselves, plus the codes used to calculate all the pixels in the images, will have to be quality controlled.” The task of “shedding light on the black box” is a responsibility of the pathology profession, he says. “We cannot, as domain knowledge experts, just say, ‘That algorithm has output x and that’s all I need to know about it.’”
Dr. Bui says patients need to know that pathologists are their advocates, doing their best to provide the diagnosis and biomarker report. “By developing guidelines for QIA, we’re showing another way we can deliver tests, using HER2 as an example,” she says. “So patients should rest assured we are taking the quality of the work we do very seriously. If they want to ask their medical oncologist or pathologist to read their slides digitally, I think that’s a good thing. Pathologists are in control of the process, and if the algorithm gives a ridiculous result, they can detect and override it.”
QIA has been available for a long time, and it is showing high reproducibility and accuracy, Dr. Bui adds. “We are encouraging you to use it, and while you do use it, we hope you find this guideline helpful and practical.”
Anne Paxton is a writer and attorney in Seattle. The guideline was endorsed by the ASCP Commission on Science, Technology, and Public Policy and the Association for Pathology Informatics Council.