Home >> ALL ISSUES >> 2016 Issues >> Rebooting IHC for companion diagnostics

Rebooting IHC for companion diagnostics

image_pdfCreate PDF

Anne Paxton

January 2016—Immunotherapy has taken cancer treatment by storm. And given the number of proteins that are targets for immunotherapy and other targeted therapies, immunohistochemistry should theoretically be the ideal method for classifying patients as responders versus non-responders. But there are several reasons why IHC hasn’t reached this status within personalized medicine, says Clive R. Taylor, MD, DPhil, professor of pathology in the Keck School of Medicine of the University of Southern California.

Presenting at a webinar hosted by CAP TODAY and sponsored by Horizon Diagnostics, titled “Immunohistochemistry Through the Lens of Companion Diagnostics” (www.captodayonline.com/2015/Webinars/cap_111015/index.html), Dr. Taylor and David L. Rimm, MD, PhD, professor of pathology and of medicine (oncology) at Yale University School of Medicine, describe how lack of standardization and reproducibility, as well as other measurement issues, have so far stood in IHC’s way. (See the February issue for Dr. Rimm’s comments.) But certain strategies can help overcome these obstacles and improve IHC’s usefulness in companion diagnostics.

“Immunotherapy is an increasingly appealing therapeutic strategy for patients with cancer, with many late stage clinical trials demonstrating overall survival advantages in melanoma, prostate cancer, and non-small cell lung cancer,” says co-presenter Farah Patell-Socha, PhD, product development manager, diagnostics, for Horizon Diagnostics, which is focusing on precisely defined IHC reference standards.

Currently there are multiple PD-L1 (programmed death-ligand 1) drugs in the same class, each with its own companion diagnostic. “Community oncologists have to figure out what is the best test to use within their time constraints and limited understanding of genomics, and often with insufficient data,” Dr. Patell-Socha says. The key problem is not how to regulate the diagnostics, but the downstream challenge for pathologists performing the tests for the different assays.

“Companion diagnostics” are not exactly a new concept, Dr. Taylor points out. The original companion diagnostic was basically the mind of the pathologist plus a microscope and a microtome, he notes. And there were a series of histochemical and biological stains, most invented in the 1850s to 1900s. The mark of that legacy endures and pathologists are, in a way, locked into its mentality. Not much has changed, Dr. Taylor says. “We’re still doing most of this in more or less the same way that we did 100-plus years ago.”

For example, in most labs, the routine stain is the H&E and there are no controls for how blue or pink it should be. Tweaking the result to please the pathologist who is on service has often been the response.

Dr. Taylor

Dr. Taylor

“It will be ‘add a little more blue to the hematoxylin for this pathologist versus that pathologist’—which means, of course, that the reproducibility of the H&E alone is rather poor.” Or, a pathologist might demand, “‘That IHC stain doesn’t look as brown as I thought it should. Send it back to the lab and get me another one.’ So the tech will give a little more time on incubation or increase the concentration of antibody and, lo and behold, they’ll get a more intense brown color. And that, of course, is a disaster if we’re trying to quantify anything.”

Even though pathologists have years of experience in interpreting the H&Es that cross their desks, “The problem is we have adopted that sort of attitude toward immunohistochemistry,” Dr. Taylor says.

More rigorous approaches are essential for IHC to meet the new challenges that companion diagnostics present. “While IHC was just a special stain, we sort of got away” with the more casual approach. “But for IHC to really work well, we should have improved positive and negative controls.”

Calling IHC “witchcraft” goes too far, but there is often more artistry than assay in the stain, Dr. Taylor says. “If we’re going to make it quantifiable, we must have a detailed, strict protocol, we must follow the protocol exactly, and we must have a better system of controls than we currently have.”

The primitiveness of the scoring for predictive markers like HER2 is a central problem, he believes. The difference between a 3+ and a 1+ in two sections from the same case, for instance, could all be attributable to differences in sample preparation, especially differences in fixation. It has been known for many years that “you can change an estrogen receptor stain from negative to positive just by changing the section thickness from 4 µ to 6 µ. So that is another variable that is not well controlled.”

“And we certainly know that by changing the immunostain process itself, changing the antibody, changing the concentration of the antibody, changing the incubation of the antibody, changing the detection system across many different labs, knowingly or unknowingly, we can change the intensity of the result to convert a 1+ to a 3+ or vice versa.” Leaving the chromatin to develop for longer or shorter periods will also change the result, he adds.

“Also, we know that there is heterogeneity within the section. We can find 3+, 2+, and 1+ sitting side by side. And how do we interpret that in the context of whether the patient will respond to treatment with Herceptin or not?”

Last and certainly not least, he adds, pathologists vary in how they score such samples. “I’ve seen pathologists score slides that I’ve scored 2+ and they’ve scored it 1 or 3. Are they right or am I right, or are we all wrong? We actually don’t have a good way of knowing.”

As a result, “Even when we’ve written down a score of 2+ or 3+, we actually don’t know that that score is real. And by ‘real,’ I mean does that reflect the true biology of the tissue in the tumor that we’re looking at? Because all of these other factors can affect the intensity of the stain result.”

Better controls are a large part of the solution to improving this system, Dr. Taylor says. “We should try to convert the IHC stain to a quantitative enzyme-linked immunosorbent assay, an ELISA-type method. Ideally, we need a quantifiable reference standard that all of us could use, the same standard in all of our labs. We would increase reproducibility by doing that simple, single, one thing.”

But to do so, “We have to look at all phases of the immunohistochemical process—not just the staining process itself—the whole process. Sample acquisition, fixation, retrieval, reagents, protocol, basic controls, interpretation, scoring, and reporting. We have to control everything.”

Interestingly, essentially the same reagents are used for an ELISA as for IHC. “It’s a matter of using an antibody that’s labeled against the target protein, or antigen, in an ELISA process. We actually see color in the solution, which we quantify against the reference standard. And by comparing that to the calibration reference standard, we can accurately quantify the amount of target here. The only difference in an immunohistochemical stain is that the color remains localized to the tissue section and is not released into the fluid.”

Moreover, as Dr. Taylor notes, it’s possible to do an immunohistochemical stain, then lyse the section and release the color into the supernatant and quantify the color release. “That has been done and works effectively; [pathologist] Craig Allred showed that.

“So the methods are very similar. Both are antibody-labeled detection methods. The fact that ELISA has excellent reproducibility and is strictly quantitative—in fact, it’s the gold standard—relates to the fact that sample preparation is rigidly controlled and the process is fully automated. You can’t tweak it or interfere with it. And there’s a universal reference standard that we can all use.”

By contrast, immunohistochemistry lacks controlled sample preparation, it’s only partially automated, people tweak the system all the time, and there is no reference standard. “So suppose we converted the ELISA principle to immunohistochemistry. Would we change poor reproducibility and lack of quantification to excellent reproducibility and accurate measurement?” The answer is “possibly,” he says.

But the preanalytic part of this process is especially difficult to control. “First of all, there are so many variables in pre-analysis. There’s warm ischemia after the vessels are clamped, before the tissue is removed. There’s cold ischemia, from transport from the OR to the lab before it gets to gross. When is the tissue put in the formalin? How long has it been in the formalin, how big is the block, what’s the fixation type? How fresh is the formalin? What’s the pH? What’s the total time? All of these can produce variation in the intensity. And getting a standard process for all of these, that is adopted by all of us, is extremely difficult.”

The use of formalin presents many problems. “It works fine for looking at an H&E stain. It works fine for some immunohistochemistry after antigen retrieval, but we know that if we increase the fixation time, you can look at the same piece of tissue after eight, 32, 56, and 104 hours and see how the staining has decreased. And this loss will be seen for many different targets and many different markers.”

So for immunohistochemical stains, Dr. Taylor says, “We start with a handicap. We’ve lost an unknown amount of antigen. And somehow or other we need to control for this loss.” He believes that the principle of internal control is vital to being able to understand the preanalytic process and at least monitor for any compromised staining that is seen.

The CAP and various other institutions are working on guidelines for standardizing these processes, Dr. Taylor says. “But even when guidelines are produced, following them is difficult. And some kind of monitoring process that proves that we [pathology labs] followed the guidelines, so far, is completely absent.”

There is an astonishing amount of variability from lab to lab, which has been shown in a survey by the United Kingdom National External Quality Assurance System (UK NEQAS). “They sent unstained slides out to 365 labs and asked them to stain for smooth muscle actin and cytokeratin. Two hundred ninety-seven labs used heat retrieval, with only 76 percent getting acceptable results. Enzymatic retrieval was used by 32 labs, and only 29 percent got acceptable results.”

“There’s a message right there: Enzymatic methods are not as good as heat methods. So why are people still using them? And how can they possibly expect the same results absent a common control?”

Other sources of variability abound. For example, “You can look [in the UK NEQAS study] at 26 different primary antibodies from 16 suppliers. Twenty-six different detection reagents from 13 suppliers. Seventeen different autostainers from seven suppliers. Chromogen from 19 suppliers. How can we possibly know we have the same results? We cannot, unless we have a common control.”

“It’s astonishing, really, that we can even make stains reproducible to a degree,” Dr. Taylor says. “The fact that we can is a tribute to the control systems we’ve developed to this point, inadequate though they are for future needs.”

A2002 conference called by the National Institute of Standards and Technology in Washington, DC, on the topic of IHC HER2 assays, addressed these standardization issues and concluded that a reference standard was necessary and it should have the following characteristics, Dr. Taylor says. “It should be subjected to all the same rigors of sample preparation, sample ischemia, transport fixation, and so on. It must be integrated in all steps of the assay protocol, including evaluation of the results. Ideally, there should be a known amount of reference standard protein present. It should be universally available so that we could all use it. It should be inexhaustible, and it should be inexpensive.”

The choices currently available as “controls,” such as a known tissue block or tissue microarrays, have various disadvantages. “Right now, in our lab, we mostly use our own internal, in-house tissue. We also can use protein spots, which can be quantifiable. Or we can use cell lines or controlled cell lines, which can be produced in a reproducible fashion, and in theory, we all could use them. And we can use controlled cell lines grown into faux tissues, which have the advantage of retaining some morphology. Cell lines and spots, of course, have the potential to be quantifiable.”

The controls developed in Dr. Taylor’s laboratory are increasingly placed on the slide with the test tissue, and they include negative and positive controls. Recommendations from the Ad Hoc Committee on Immunohistochemistry Standardization, which involves the Canadian Quality Assurance System and UK NEQAS and NordiQC, call for the use of high and low expressor tissues as positive controls. The low expressor will tell you if the sensitivity of your test is down, and the negative controls will provide some information about specificity.”

Various controls such as 3D faux tissue, which involves growing mixtures of cell lines in a 3D-culture system where you get reproducible morphology, “have the huge advantage of being able to produce high, intermediate, and low expressors in a standardized system so that in theory we could all use the same controls.”

However, “Currently we use tissues from within our own labs, and obviously we can’t all use the same tissue. This particular tissue as a control is great for my lab, but your lab will have a different tissue, therefore a different control. And even if you use normal tissues, they’re not the same ones that we have. So we will not be able to improve reproducibility from lab to lab unless we actually have a control, where we all have the same validated control system. In the future, for companion diagnostics, we have to move to something like that.”

One reason it’s become particularly important is that therapies similar to anti-PD-L1 are increasingly employed, and there are many different antibodies to PD-L1, which can produce large differences in intensity and distribution.

“We’re getting different results using the same detection system on the same piece of tissue, because we are changing the antibody. How can this ‘variable test’ be predictive for a particular targeted therapy? So we need to have controls in place to ‘cross-compare’ or ‘cross-reference’ different antibodies. And it’s not just a problem with PD-L1. It’s a problem with just about every target you can name.”

Turning to the topic of internal, or within-tissue, controls, Dr. Taylor says with the internal protein caldesmon, for example, it can be shown that intensity of the “staining reaction” is lost with the increased duration of fixation. “You can, in fact, produce a ‘degradation’ or a ‘loss curve’ for such internal proteins as desmin and caldesmon. And if you know what those curves are, and if you also know, from experiment, how your test antigen compares to either of these two internal [control] proteins, then you can use these curves as a measure of how much of the ‘test antigen’ staining has been lost.”

In assessing the results in these cases, the naked eye isn’t enough, Dr. Taylor emphasizes. “The problem with human beings scoring this type of result with the naked eye is that we’re not good at detecting small changes in intensity reproducibly. This is going to have to be done by digital image analysis, which fortunately now is possible—and is even becoming easy.”

“We would need to look at the test analyte protein in one color and the internal reference standard in another color, on the same section at the same time—that is by using a carefully controlled double ‘stain.’ We know, because both these molecules are in the same tissue section, that both had the same warm ischemia, cold ischemia, same fixation, same processing, same stain, same detection, same everything, except for chromogens of different color. So they can be directly compared, but not with the naked eye. We cannot distinguish shades of brown versus red. But by computer, spectral imaging, we could separate red and we could separate brown. We could compare them by counting, as well as by measurement of differences in intensity.”

“So this approach could serve to validate the sample preparation, as being suitable for that assay. And it could also serve as a calibration standard.” Fortunately, image analysis has come forward in leaps and bounds in the past five years or so. Using that, Dr. Taylor says, “We can cross-count to compare different intensities. We can look at location in relation to tumor. We can quantify by counting. We can do phenotyping. And we can measure biomarker expression, by multiplex methods, all in a single slide. But again, we cannot do it by the naked eye.”

CAP TODAY
X