Feature Story

cap today

Making the call on HER2 testing methods

February 2002
William Check, PhD

Pathologists and oncologists are debating which is the better way to qualify patients with metastatic breast cancer for treatment with trastuzumab (Herceptin), a monoclonal antibody directed against the HER2 receptor: by measuring HER2 protein overexpression on cell membranes using immunohistochemistry, or by measuring amplification of the HER2 gene using fluorescence in situ hybridization. With the FDA’s Jan. 10 approval of Abbott/Vysis’ PathVysion FISH assay as an alternative method of selecting patients for Herceptin therapy, the debate is sure to heat up, particularly in the commercial arena. While deciphering the relative merits of these two assays is important, the controversy is overshadowing several important facts:

Correlation between IHC and FISH positivity is only about 80 percent. Butthere are no clinical data to resolve discrepant findings.
Currently available clinical outcomes data do not prove conclusively that either test is superior.
Appropriate clinical outcomes data, which the FDA called "the only gold standard" for "determination of the ability of a HER2 detection method to optimally select patients for trastuzumab therapy," do not now exist. Trials under way may settle this question.
Interlaboratory variability of IHC is unacceptable. But data now emerging show that when FISH becomes more widely disseminated, it, too, has poor interlaboratory agreement.

Several critical questions arise from these facts:

Should all HER2 testing—both IHC and FISH—be done in reference laboratories, as is now done in clinical trials groups?
What can be done on a national level to improve proficiency inboth tests?
Even the best response rates to Herceptin are low. How can patient selection be improved?

Improving the accuracy of selection methods, particularly IHC, gains added urgency with the imminent arrival of additional anticancer agents for which eligibility will be determined by characterization of patients’ tumors. "This is a new paradigm for drug treatment," says Elizabeth Hammond, MD, chair of pathology at LDS Hospital, Intermountain Healthcare, and professor of pathology at University of Utah School of Medicine. "Herceptin represents the best example of a drug whose application is totally dependent on determining that a patient has protein overexpression." Many similar agents are now in the pipeline.

Mark Pegram, MD, associate professor of medicine, Division of Hematology/Oncology, and director of the Women’s Cancer Program, UCLA/Jonsson Comprehensive Cancer Center, agrees. "The next anti-neoplastic reagents will be targeted to proteins, not ampified genes. For those agents, we will have to rely on IHC, so we will have to make it work."

Herceptin was approved in September 1998 to treat metastatic breast cancer in patients who overexpress HER2 protein. At that time, FDA linked trastuzumab approval to concomitant approval of a diagnostic IHC assay. The IHC assay used to qualify patients in the clinical trials—called the Clinical Trials Assay (CTA)—was considered "impractical for commercialization and widespread use," according to Genentech. "The CTA was incredibly laborious—it took 18 hours and 35 steps," says Kenneth Bloom, MD, director of immunohistochemistry and director of the Breast Pathology Service, Rush-Presbyterian-St. Luke’s Medical Center, Chicago.

Accordingly, HercepTest was developed in a collaboration between Genentech and Dako to mirror the CTA and to have defined conditions of fixation and antigen retrieval. As with the CTA, only patients scoring 2+/3+ on HercepTest are eligible for treatment with Herceptin. Since then HercepTest has become the assay routinely used in the clinical setting to test breast cancer tissues for HER2, but much investigation and off-label testing has been done with the Vysis FISH assay. In November 2001 Genentech asked the FDA to add PathVysion to the Herceptin package insert. FDA used the Dec. 6 hearing to review this area more widely.

"FDA wanted to see what had happened to HER2 testing in the last few years, especially clinical trials where the benefit of Herceptin was evaluated," Dr. Hammond says. She calls the meeting "report-card time—how well are current systems working?"

Because of this wider mandate, much old and new data were presented at the hearing (www.fda.gov/ohrms/dockets/ac/01/briefing/3815b1_08_HER2%20FISH.doc).

One of the earliest comparisons between the two methods was a retrospective analysis by FISH on 623 slides from patients previously screened for the Herceptin clinical trials. (LabCorp did both initial CTA screening and the subsequent FISH assays.) A forced 1:1 ratio of positive/negative slides was selected (positive, 2+/3+ on the CTA; negative, 0/1+). In 15 percent of slides, FISH did not provide an informative result. On the remaining 529 slides, the concordance rate between FISH and CTA results, extrapolated to the composition of the overall trial population (32 percent IHC 2+/3+), was 88 percent. Notably, 11 percent of 3+ cases and 76 percent of 2+ cases were not amplified and about 3.5 percent of 0/1+ cases were amplified, reinforcing the notion of false-positive and false-negative results on IHC.

However, a number of problems arise in interpreting these data. Allen M. Gown, MD, director and chief pathologist, PhenoPath Laboratories, Seattle, calls this result "not very surprising.

"The CTA had a poor correlation, only 82 percent, with HercepTest," he says. "If you compare FISH to IHC but use poor IHC, you will have better results with FISH. But that doesn’t condemn IHC overall."

Patrick Roche, PhD, associate professor of pathology and director of the immunohistochemistry laboratory at the Mayo Clinic, agrees. "Those comparisons were made between FISH and an IHC assay that was not considered robust enough to be approved for clinical use," he says. "There has been no valid comparison between FISH and the two approved IHC clinical assays [HercepTest and Ventana Medical Systems’ Pathway], other than two ongoing adjuvant Herceptin trials, NSABP B31 and Breast Intergroup N9831."

Interpreting this retrospective comparison is hindered also by a lack of essential clinical data. Many who work with HER2 assume that IHC-negative, FISH-positive patients would respond to Herceptin and that IHC-positive, FISH-negative patients would not. However, there are no prospective clinical outcomes data bearing on this point. "That was a major concern of the [FDA] committee members," says Dr. Hammond. "There is a small but significant proportion of patients who have protein overexpression and no gene amplification. There are no good outcomes data to tell us what happens to those patients on treatment."

In addition to making clear its concern about outcomes among IHC 3+, FISH-negative patients, the FDA review committee concluded that current data "[D]o not provide information on the clinical outcome for patients whose tumors score IHC 0 or 1+ and FISH (+)." Retrospective comparisons are "useful for hypothesis generation for future studies," they wrote.

What is needed, says Debu Tripathy, MD, associate professor of medicine, UCSF-Mt. Zion Cancer Center, and attending at Carol Franc Buck Breast Care Center, is to be able to identify better who is truly HER2 positive, who will respond to Herceptin. "Are IHC-negative, FISH-positive specimens a true phenotype?" he asks. "Or has fixation gotten rid of the protein epitope?" DNA is known to be more stable to fixation and processing than protein. "More trials now are requiring central testing of tissue by FISH or IHC or both," Dr. Tripathy says. He is optimistic that "all of these issues will be resolved over time."

Outcomes data are now available from two single-agent Herceptin trials (0649, 0650), as well as from a randomized trial of chemotherapy versus chemotherapy plus Herceptin (0648). "All support that FISH is useful for selecting patients who will benefit from Herceptin therapy," according to Dr. Pegram. "The data are compelling—FISH has the potential to become the preferred clinical assay for detection of the HER2 gene alteration," he says. "As cost comes down due to economy of scale, it could even replace immunohistochemistry for assessment of HER2 status."

Two endpoints have been used in these clinical trials: time to progression (TTP) and response rate. In 0648, TTP was significantly longer for all FISH-positive patients in the Herceptin-plus-chemotherapy arm than for those treated with chemotherapy alone, with a relative risk of 0.44. And the response rate for all FISH-positive patients was significantly higher in the Herceptin-plus-chemotherapy arm, 54 percent, than in the chemotherapy arm, 30 percent. No advantage was seen in FISH-negative patients. Positivity on FISH clearly selects patients who will benefit from the addition of Herceptin to standard chemotherapy.

However, data on all IHC 3+ patients (either FISH-positive or FISH-negative) are equally impressive. A significant benefit in TTP was seen among 3+ patients for whom Herceptin was added to chemotherapy, with a nearly identical relative risk to that seen in FISH-positive patients, 0.42. And an increased response rate was seen among 3+ patients for whom Herceptin was added to chemotherapy, 55 percent versus 31 percent, again nearly identical to that seen among FISH-positive patients. The advantage among 3+ patients was not gained by treating fewer patients: Responses occurred in 89/164 FISH-positive patients treated with Herceptin plus chemotherapy, and in 94/169 IHC 3+ patients.

In the 0649 trial, 20 percent of FISH-positive patients responded to Herceptin single-agent therapy, compared with no FISH-negative patients. But the response rate among all IHC 3+ patients was almost identical, 19 percent. Again, total numbers of patients responding were similar: 33/163 FISH-positive patients and 30/157 IHC 3+ patients. (Three responders were IHC 2+, FISH-positive.)

Finally, in 0650, the response rate among FISH-positive patients was 34 percent, while among IHC 3+ patients it was 35 percent. Two responders were FISH-negative, IHC 3+.

Based on these clinical data, the FDA reviewers concluded that, because of the studies’ design, "[C]omparative claims of equivalence cannot be made, and claims of superiority cannot be made." However, they noted, "The general magnitude of the beneficial effects of trastuzumab therapy in the FISH (+) subgroup is similar to that in the IHC 3+ subgroup."

Dr. Pegram presented phase II data from a fourth clinical trial at a recent oncology meeting in Lisbon. All patients were treated with taxotere, platinum salt, and Herceptin. Patients in the FISH-positive subset had a significantly longer TTP, 17 months, than the FISH-negative subset, 7.4 months. Responses were seen in 64 percent of FISH-positive patients and 41 percent of FISH-negative patients. "These data support the use of FISH to discriminate patients who are most likely to benefit from Herceptin-based therapy," Dr. Pegram says. He adds, "The study was not intended to be a comparison of FISH and IHC."

In addition to direct comparison and clinical outcomes, interlaboratory reproducibility is another way to assess assays. Three relevant datasets of this type were presented at the FDA hearing; a fourth dataset is under publication review.

In the NSABP B31 clinical trial, patients were tested by IHC in a local laboratory, with positive samples (e.g. 3+ on HercepTest) sent to a central laboratory (LabCorp) for review by both HercepTest and FISH. No patients had been qualified by FISH testing at that time. Data were presented on 104 positive specimens. If the local laboratory was a reference laboratory and used HercepTest, the central laboratory finding agreed in 96 percent (27/28) of cases. (A reference laboratory was defined as one with an average of 100 HER2 cases per month for six months.) If the local laboratory was a nonreference laboratory using HercepTest, the concordance rate was 81 percent (42/52). If the local laboratory was a nonreference laboratory using other antibodies, the concordance rate fell to 52-65 percent.

Dr. Roche presented concordance data for IHC and FISH from the Breast Intergroup N9831 study. "We are seeing 75 percent concordance between samples testing IHC 3+ in a local laboratory and central testing," he says. For FISH, the local-central concordance rate was even lower, 67 percent.

"What that means," Dr. Roche says, "is that both of these are complex tests. It is not any easier, as it has been advertised, to do FISH in the community or locally than it is to do IHC."

In his view, the difficulties with performing FISH in local laboratories are the same as with IHC—volume and feedback. "You need expertise to do both tests," he says, "and for that you need experience and feedback. I don’t think you can do either test in isolation and know that you are doing it correctly."

Dr. Hammond calls the results of these studies "pretty disheartening" and "similarly dismal" for both IHC and FISH. "Obviously, this variation in testing makes earlier results suspect," she says.

With regard to IHC, Dr. Hammond says, "What is happening is that laboratories around the country are running the test as if they were doing keratin stains to tell whether a tumor is epithelial, rather than a quantitative test to determine therapy. They don’t realize that a much higher degree of rigor is required to make that determination."

To Dr. Bloom, the poor correlation with IHC, particularly among laboratories not doing HercepTest, is no mystery. "The first thing pathologists did [when HercepTest was approved] was to ignore the kit and the protocols in the kit and to do their own thing, as pathologists do," he says. Some used microwaving instead of a water bath for antigen retrieval; others used different fixatives in place of formalin. "Then they wrote articles saying that if you change things, the kit performs differently."

Dr. Bloom sees the same problem with precision that Dr. Hammond sees. "Typically, we make up a scoring system of 0 to 3 and don’t care much about a one-grade difference," he says. "But now a therapeutic decision is being made on that one-grade difference between 2 and 3. We have never before had to adhere to such rigid standards in an IHC test."

He does see a positive side to the NSABP data. "One of the big hits against IHC is that you can’t control how the tissue is handled, fixed, and stored," he says. But in this series reference laboratories had no problems with IHC. "And the one thing reference laboratories can’t control is how tissue is processed," he notes. "What this suggests to me is that, if you do the test in a standardized way, the way it is supposed to be done, and are skilled in the assay, you get a reproducible result for 3+ staining. Whereas hospital laboratories either have problems with interpretation or subtle differences in staining."

Not everyone accepts the results of these studies. Dr. Pegram questions the FISH discordance data from the Breast Intergroup study. "I have not seen those data on FISH in noncommercial laboratories," he says. "But I have seen a lot of FISH assay slides, and they are not difficult to read. There may be some people in some laboratories who do not have that kind of competence, but I find it difficult to believe that [the discordance rate] is as high as was reported."

Also reported at the Dec. 6 FDA hearing was a third reproducibility dataset, a validation study in which 250 slides from the Herceptin clinical trials were selected for testing by FISH in two laboratories, LabCorp and the laboratory of Michael Press, MD, PhD, professor of pathology in the Norris Comprehensive Cancer Center at the University of Southern California, who has longtime experience performing FISH assays for HER2. Due to an 11 percent failure rate with FISH in Dr. Press’ laboratory, 223 slides were available with results in both laboratories. Concordance between the two laboratories for FISH positivity was "poor," the FDA reviewers said. Discordance was in a specific direction: Of samples testing positive in Dr. Press’ laboratory, 32 percent (37/116) were negative at LabCorp. Only two of 107 samples testing negative in Dr. Press’ laboratory tested positive at LabCorp.

Dr. Roche’s interpretation: "These results show that FISH is not foolproof, even among experienced laboratories."

A fourth interlaboratory reproducibility dataset, contained in a paper submitted for publication in the Archives of Pathology & Laboratory Medicine, summarizes results of CAP Surveys from the past two years. "During that time we shared specimens between IHC and FISH Surveys," says Raymond Tubbs, DO, chairman of the Department of Clinical Pathology, Cleveland Clinic, and a member of the CAP Cell Markers and Molecular Pathology committees. In the year 2000, 35 laboratories participated in the FISH Survey, rising to 60 in 2001. About 350 laboratories participated in the IHC survey. Results show "substantial variation in IHC results," Dr. Tubbs says, particularly for FISH-negative cases, those not amplified on FISH testing. FISH concordance was 100 percent for those same cases, he reports, at least for those laboratories submitting a result. While 95 percent of laboratories submitted a result for FISH-positive cases, only about 75 percent submitted a result for the FISH-negative cases.

"Maybe those laboratories were not confident of their [negative] results," Dr. Tubbs speculates. "It is much easier to get an amplified case to stain appropriately. For nonamplified cases it is more difficult to achieve correct staining."

Dr. Roche, also a member of the CAPCell Markers Committee, agrees with this interpretation of the FISH data. "I think everyone is sure when they get a positive result, but some may be unsure when they get a negative result," he says. "In clinical practice, you would have to make the call."

Dr. Roche also points out that the discordance for IHC was primarily among 2+ cases. "In the 3+ category," he says, "concordance was very good."

"These 2+ cases are driving everyone crazy," says Richard Cartun, PhD, director of Immunopathology at Hartford (Conn.) Hospital, even though they make up only a small percentage of cases. "We have never really felt that 2+ should be grouped with 3+ as positive," Dr. Cartun says. "Early on we did differential PCR testing and saw that many cases that were 2+ on IHC were negative for amplification."

In a recent paper comparing IHC and FISH (J Clin Oncol. 2001;19: 2714-2721), Drs. Tubbs, Roche, Mark Stoler, MD, and others wrote: "We advocate a voluntary or FDA-mandated withdrawal of the 2+ HercepTest score as a criterion for Herceptin therapy and recommend an algorithm whereby all [IHC]-positive cases are confirmed by FISH."

How would different selection criteria affect clinical practice? Based on the CTA/FISH concordance study, approximately equal numbers of patients—20-22 per 100 screened—would be treated with Herceptin based on treatment of all FISH-positive patients, or all IHC 3+-positive patients, or all patients who are either IHC 3+ or IHC 2+/FISH-positive. Available clinical outcomes data are not adequate to determine the efficacy of the three selection schemes.

In the absence of definitive data, practitioners express a variety of attitudes about the relative merits of IHC and FISH. Dr. Tubbs adopted FISH a few years ago because in his view it is "the most reliable and non-ambiguous test." He adds, "Issues of acceptance are really because the technology is not available at all sites and interphase FISH is difficult and time-consuming to set up if you don’t already have it in your laboratory."

Mark Stoler, MD, professor of pathology and associate director of surgical and cytopathology at the University of Virginia, uses IHC, and his clinicians are satisfied. However, he says, "I would set up FISH tomorrow if money, time, space, and talent were not limiting. But in the real world they are." He says he would need to buy an automated instrument to perform the assay, because "this is not something we can train our technologists to do
right now."

In Dr. Cartun’s experience, IHC is a fine test. "My current feeling is that IHC, when performed correctly, is a very good assay for screening tumors for HER2 protein overexpression," he says. The big question, he adds, is how to define the 2+ category more accurately. As for FISH, Dr. Cartun says, "FISH is more technically demanding and more expensive, and not as many people have access to it or are doing it. I think that current shortages of resources and qualified technologists are going to affect hospitals’ ability to bring on FISH testing."

Both FISH and IHC can be accurate if done well, in the opinion of George Somlo, MD, associate director for high-dose therapeutics, Department of Medical Oncology and Therapeutic Research, City of Hope Cancer Center, Duarte, Calif. He sees FISH as the gold standard for assessing breast cancers for HER2 gene amplification. "If the price of the test would not be so high, probably everyone would be using it," he says. "Although," he adds, "it requires some expertise and not every pathology laboratory would be able to perform it on a large scale." As with any procedure, quality and accuracy depend on volume.

IHC can also be accurate, Dr. Somlo says. "But with IHC methodology, no one knows which antibody is best and what is the best method of recovering protein." However, he adds, "If IHC is performed in a credible pathology laboratory, there is a fairly good correlation between strong overexpression of protein [3+ staining] and gene amplification." In his clinical practice, he uses IHC analysis. For tumors that are strongly positive, 3+, they potentially initiate antibody treatment. For borderline IHC expression, 2+, they proceed with FISH analysis.

To compensate for the discordance rates observed between local and central laboratories, NSABP and the Breast Intergroup are following a course consistent with Dr. Somlo’s idea—depending on credible labs. NSABP concluded that patients would be eligible for study entry based on IHC testing only if a positive result was validated by a reference laboratory; it listed 11 such laboratories. In the Breast Intergroup study, patients are now randomized only when HER2 positivity by either IHCor FISH in a local laboratory is confirmed by central testing.

"It may sound self-serving," says PhenoPath’s Dr. Gown, "and companies that market antibodies may be averse to this, but I think maybe not everyone should be doing IHC for HER2." Qualified laboratories could be chosen on the basis of volume and required to validate their data in comparison to FISH.

Dr. Somlo echoes this sentiment for FISH. "If FISH became the first line test," he says, "the issue would be getting smaller laboratories to agree not to do it, more than larger laboratories not having enough capacity."

A more feasible approach might be to improve performance of HER2 testing generally. Says Dr. Hammond: "We have to follow the procedures we have used for every other clinical laboratory test—have reference materials, generate a standard procedure, train technologists, and do ongoing proficiency testing. These are all things that CAP has endorsed and promoted through its Laboratory Accreditation Program over the years."

As an immediate aid in reading IHC slides more reproducibly, some pathologists are investigating semi-automated image analysis, which reads intensity of stain on IHC slides as a continuous variable. The ChromaVision ACIS is one such instrument. Randy Judd, MD, director of special technologies at the AmeriPath Center for Advanced Diagnostics, Orlando, Fla., has accumulated more than 1,000 cases with both FISH and ACIS HER2 scores. "We see a strong correlation between the ChromaVision ACIS score and the percentage of FISH-positive cases," he says. However, he notes, ChromaVision does not give information about the pattern of staining. "Is a complete membranous pattern critical for predicting Herceptin response, or is the intensity of the stain more important?" he asks.

Dr. Hammond has a ChromaVision image-analysis instrument in her laboratory. "It is working very well," she says. "It reduces subjectivity and makes the three of us [pathologists] very consistent." In Dr. Hammond’s experience, image analysis doesn’t change the number of 3+ cases, but it decreases the number of 2+ cases.

The instrument "dramatically increases the cost of HER2 testing—it doubles or triples it," she says. "We told our oncologists it would increase cost and consistency and asked them, Do you want it? They said yes."

Dr. Bloom has published data showing that use of ChromaVision image analysis improves the correlation between IHC and FISH readings among inexperienced pathologists from as low as 42 percent to over 90 percent. He notes that there are not yet any clinical correlation data for image analysis, however.

Dr. Tubbs has limited experience with image analysis. "But my gut reaction," he says, "is that it would be easier and more reliable to set up FISH than thrashing around to set up an image-analysis system." The ultimate solution, he predicts, will be a bright-field in situ hybridization assay.

"GoldFISH"—which stands for gold-facilitated autometallographic in situ hybridization-is such an assay that was devised by Dr. Tubbs and is being developed by the Cleveland Clinic Foundation in partnership with Nanoprobes. Manuscripts reporting the assay’s technical validation and interobserver interpretive reproducibility are under review. GoldFISH requires only a light microscope; it does not require oil immersion or dot counting. "It is either positive or negative, amplified or nonamplified," Dr. Tubbs says. "We rarely see low-level amplification."

Says Dr. Stoler: "Fluorescence is difficult to do well and time-consuming. And it is hard to correlate where the morphologically abnormal cells are." He believes that an in situ hybridization assay with a non-fluorescent readout, like the one Dr. Tubbs is developing, "would gain a lot more acceptance
by pathologists."

For now, even in the best of circumstances, only 30 to 35 percent of selected patients respond to Herceptin as a single agent. What can be done to sharpen selection of patients who will respond? Dr. Bloom points out that the probe used in the FISH assay only detects HER2 gene amplification. "But it is not the HER2 gene by itself that is amplified," he says. It is an amplicon that includes HER2 plus a variable amount of the surrounding genome. What else might be amplified that might be important to response? One candidate gene is topoisomerase 2, which is sometimes amplified along with HER2 and which makes a protein that controls responsiveness to anthracycline therapy.

Another approach would be to look at downstream regulators that influence HER2 protein overexpression. "Part of the diversity of the HER2 family of receptors is that, when activated, they don’t necessarily elicit the same signal," Dr. Bloom says. "It depends on what ligands and co-receptors are present and what other members of the HER receptor family they bind to."

In either case, Dr. Bloom says, "By focusing on IHC or FISH, we may be missing a more complex picture."

Dr. Pegram agrees with this analysis but notes that there are not now good clinical reagents to measure downstream markers of HER2 activity. "Such future tests will be largely IHC-based and might be very useful in expanding the repertoire available for HER2," he says.

But improving the clinical value of HER2 assays is part of a bigger problem, says Dr. Hammond. She is a member of a strategic planning group—Program for the Assessment of Clinical Cancer Tests—established by the NCI to wrestle with this question: How can we take the thousands of promising markers that research people find and turn them into clinically relevant tests? One example is the oncoprotein p53.

"A few years ago everyone thought it was the hottest thing, that it would tell who would die and who wouldn’t," Dr. Hammond says. "But it never panned out to be a clinical marker, because studies were poorly designed to test p53 utility as a clinical marker and tests were not performed in a standardized manner so that results from various studies could be aggregated to evaluate benefit."

The need to resolve this issue gains urgency with the expected release soon of two more biologicals like Herceptin for which a qualifying assay will be mandatory. One is an antibody against the HER1 receptor (also known as EGFR) and the other is an inhibitor of a downstream tyrosine kinase. Regulation of EGFR expression is complex, Dr. Tubbs says: Overexpression can occur in the complete absence of gene amplification. "Compared to EGFR, the problems we encountered with HER2 standardization will resemble a stroll in the park," he says.

Says Dr. Hammond, "The floodgates are about to open."

William Check is a medical writer in Wilmette, Ill.