Working out the kinks in HER2 testing

Using FISH for primary testing

September 2000
Cover Story

William Check, PhD

When the therapeutic antibody trastuzumab (Herceptin, Genentech) was approved for treating metastatic breast cancers that overexpress the oncoprotein HER2, laboratories had to introduce an assay for HER2 to identify eligible patients. Raymond Tubbs, DO, chairman of the Department of Clinical Pathology at the Cleveland Clinic, initially adopted an in-house immunohistochemistry assay. After a time, he tells CAP TODAY, "Our oncologists told us, ’We are sending some cases to the Mayo Clinic and they get positive results and you get negative results on the same samples.’"

Dr. Tubbs called Patrick Roche, PhD, director of the immunohistochemistry laboratory at Mayo, and found that he was writing a letter that later appeared in the Journal of Clinical Oncology identifying a problem of false-positive results with the kit Dr. Roche was using, HercepTest (Dako), which is still the only FDA-approved method for selecting patients for trastuzumab therapy.

Soon after, Dr. Tubbs and Dr. Roche, along with Mark Stoler, MD, professor of pathology and associate director of surgical and cytopathology at the University of Virginia, Charlottesville, formed a study group to compare two forms of immunohistochemistry, or IHC, and two variants of fluorescence in situ hybridization, or FISH, which measures amplification of the HER2 gene. Since the manufacturer of HercepTest contended that excess positives with IHC were true positives resulting from protein overexpression without gene amplification, in situ measurement of messenger RNA (mRNA) for HER2 was used as the arbiter.

"If that argument were true," says Dr. Stoler, who performed the mRNA assays, "we would expect high levels of HER2 mRNA in HercepTest-positive cases that were FISH-normal. And we never found any of those cases." The investigators concluded there were significant numbers of false-positive results with IHC methods.

Results of the study were presented in May at the annual meeting of the American Society of Clinical Oncology. Since the investigators agreed on the conclusions, one might think they would all now use the same assay, and that it would be FISH. But in fact they all use different methods, and two use IHC, showing how difficult it is to say which is the best method to qualify patients for trastuzumab therapy.

In a meeting with breast pathologists and oncologists at his institution, Dr. Tubbs recommended FISH as the first-line test. IHC is offered if oncologists want it. "These molecular data and the superior correlation of FISH with clinical outcomes underscore the superiority of FISH as the clinical assay of choice," Dr. Tubbs says. (See "Using FISH for primary testing," page 66.) As for the feasibility of widespread adoption of FISH, he comments, "I think pathologists are uncomfortable with FISH. But it can be learned like any other microscopy skill."

At Mayo, on the other hand, HercepTest remains the first-line method of testing. Using the 0 to 3+ scoring system set up for the kit, Dr. Roche says, "More than 80 percent of our 3+ cases are gene-amplified." Samples read as 2+ (about 10 percent) are sent to Mayo’s Molecular Genetics Laboratory for FISH testing. Less than 15 percent are gene-amplified. (Most false-positives in the comparative study came from this category.) "HercepTest is a good kit," Dr. Roche says. "My contention all along has been that the 2+ category does not correlate with amplification. It was a mistake to make 2+ by itself a qualifier for therapy." (This is only one of many criticisms various pathologists have about the way Genentech and Dako developed HercepTest.) As for making HercepTest first-line, Dr. Roche says, "It is quicker and more efficient to run than FISH, the cost is relatively insignificant, and 3+ results correlate well with gene amplification." He adds, "It is FDA-approved and that is what our oncologists here want."

While Dr. Stoler also uses an IHC assay for HER2, he does not use HercepTest. "We have validated our own IHC method," Dr. Stoler says, "and we know how it performs relative to HercepTest and to FISH. For now we are sticking to that assay because we don’t have a significant false-positive rate. We have a few false negatives, but those can be addressed by second-tier FISH testing in the minority of cases where Herceptin is the only therapeutic choice. You really want to identify patients who are most likely to benefit," he emphasizes, "because Herceptin is both expensive and potentially toxic."

A major difference between Dr. Stoler’s IHC assay and HercepTest is that his assay uses a standard detection system. HercepTest’s detection system-branched chain amplification-is very different from other IHC tests in surgical pathology. ("Why Dako chose to use that has been somewhat of a mystery," Dr. Stoler says.) That has led some pathologists to use the kit antibody with their own detection method. "A lot of people don’t do the FDA-approved method exactly," Dr. Stoler notes. "Assay variation leads to variability of results. So," he urges, "either use the FDA-approved method exactly or set up and validate your own assay." Setting up a variant assay and assuming that the validation for HercepTest applies can lead to incorrect results.

But, like Dr. Roche, Dr. Stoler sees a fundamental flaw. "I think the FDA approved an invalid method," he asserts. HercepTest was not the method used in the clinical trials. Further, the data that led to its approval were a statistical correlation, not a 1:1 match. Dr. Stoler suggests that, while prospective clinical correlation studies are being done, that Dako or the FDA, or both, consider revising what is called positive.

Turning to FISH, Dr. Stoler says, "While it might seem obvious that FISH would be a better way to test for an indication to use Herceptin, FISH is not nearly as simple to do as IHC. So this raises the question of how a laboratory without the equipment or experience would do in situ hybridization."

To complicate things further, second-generation FISH assays may not require fluorescence. And they will have a permanent record instead of looking at dots under a microscope. "Those changes will happen within the next six to 12 months," Dr. Stoler predicts.

Most important, he believes, is communicating with clinicians about the realities of tests. "We have an active breast clinic here," he says, "and our clinicians are perfectly happy with the rate of IHC positivity coming out of our laboratory. It meets their needs for patient care."

HercepTest is the first-line assay adopted by Elizabeth Hammond, MD, chair of pathology at LDS Hospital, Salt Lake City. In her experience, intense complete membrane staining in more than 50 percent of cells-a 3+ score-has 90 percent correspondence with gene amplification. "It is only when we deal with either complete membrane staining that is weak or present in only a small proportion of cells-2+ samples-that we have poor correspondence with the FISH assay," she says. Thus, in her hospital patients whose tumors have 3+ staining qualify for Herceptin therapy while 2+ scores are confirmed by FISH in house. "It is very easy to train technologists to use either of the two available FISH kits if you have a laboratory equipped to do immunofluorescence microscopy," Dr. Hammond says. (Her hospital examines kidney and heart biopsies by immunofluorescence microscopy; at other institutions it is used for prenatal genetics.) She finds that both the Vysis and Ventana kits work well. However, Dr. Hammond comments, "FISH is expensive and time-consuming, because you have to count many cells and results have to be confirmed by a pathologist. I prefer doing IHC. It is cheaper, easier, and faster. But for 2+ tissues, for the sake of the patient, you really need to confirm that they are positive by FISH."

Like Dr. Stoler, Dr. Hammond says, "HercepTest is very standardized, so if you use it exactly as provided, you have a high likelihood of getting excellent results. I think a lot of difficulties have come from people using cheaper, non-FDA approved antibodies or modifying the kit. Either of those approaches will cause problems if you don’t validate against HercepTest. If you change the method," Dr. Hammond says, "you can change the results." She endorses the quality control guidelines in the CAP’s Laboratory Accreditation Program Checklist 8, Anatomic Pathology.

Dr. Hammond also sees a basic problem with the kit. "The scoring system was developed when Genentech came out with Herceptin based on antibodies they used then," she says. "That scoring system was superimposed on HercepTest, even though it uses a different antibody, because FDA required it. I think that is creating some confusion."

Kenneth Bloom, MD, director of laboratory operations at Rush-Presbyterian-St. Luke’s Medical Center, Chicago, believes, too, that obtaining accurate results is under the laboratory’s control. "What it comes down to are really pathologist problems," he says. With a conventional qualitative IHC assay, laboratories typically manipulate conditions for antigen retrieval or antibody handling to get better staining. "That is exactly what you can’t do with a quantitative test like HercepTest," Dr. Bloom says. The kit includes pretitrated antibody and defines exactly how to do antigen retrieval. It assumes that tissue is cut to a thickness of 3-4 痠, fixed in 10 percent neutral buffered formalin for 12 to 18 hours, and processed in a routine tissue processer. "But everybody changed the test a little," Dr. Bloom observes, "then they badmouthed it." He has done between 400 and 500 cases handling tissue exactly according to HercepTest directions and finds 92 percent concordance with FISH. (He does both assays on all samples for a research protocol.)

"This is the first truly quantitative IHC test in anatomic pathology," Dr. Bloom says. "But there is going to be a whole wave of them. And we will have to handle tissue properly to get accurate results."

Misreading IHC also contributes to false results, in Dr. Bloom’s view. Calling 1+ samples 2+ "makes it look like only a small minority of 2+ tissues respond to Herceptin," he says. "We consider cases as 2+ positive only when there is a distinct chickenwire appearance on high-power examination that is not visible on lower-power examination. With this definition, 100 percent of our 2+ cases have been amplified, although they represent less than five percent of our overall cases. So I am not a big believer in taking our 2+ readings to FISH. However, cases that are scored as 0 or 1+, that are ER/PR-negative and show a high proliferation index may benefit from FISH testing."

As for doing FISH, Dr. Bloom comments, "It is not quite as easy as what they put out." FISH is reliable and relatively easy to score, but it has a gray zone right at the borderline of amplification. And it is time-consuming. "I look at all FISH slides myself," he says. "They may take five times as long to read as IHC. I can do 20 cases in one to 1.5 hours."

On the other hand, he says, "There seems to be a reluctance by pathologists to introduce FISH into their laboratories. I don’t see why. We easily cross-trained our immunohistochemistry technologists to do FISH even though we had not done it in our laboratory before." A broad view of HER2 testing was offered by Ann Thor, MD, staff pathologist at Evanston (Ill.) Northwestern Healthcare and professor of pathology and surgery at Northwestern University Medical School, Chicago. As director of the pathology coordinating office board for the Eastern Cooperative Oncology Group, Dr. Thor does central eligibility testing for patients going into Herceptin trials. She has evaluated both the Ventana and Vysis FISH kits and perhaps 10 different IHC reagents. "We have found great comparability between IHC, FISH, and PCR-based molecular assays," she says.

Clinically her hospital uses IHC with FISH backup for 2+ samples. "I think that is a reasonable strategy based on cost and efficacy of screening breast cancer patients," she says. "My view is that FISH is too expensive and time-consuming to use first-line. Pathologists are supposed to confirm FISH results by counting representative sections, but in busy laboratories they often delegate that to technologists."

Dr. Thor identifies a lack of strict adherence to controls as one source of problems with IHC. HercepTest has slides with three cell line controls, but a small-volume laboratory that does only three to five slides per run will quickly run out of control slides. "It is worth it to buy extra controls," Dr. Thor suggests.

But she sees lack of standardization as the biggest problem. In her experience, most U.S. laboratories either don’t use HercepTest or don’t use it as provided. Many use the kit’s primary antibody with their own reagents; others put it on a machine, although the FDA-approved method is manual. (An automated method is now available, but only for the Dako immunostainer. "Most laboratories that have a Ventana instrument will not go out and buy a Dako just for this test," Dr. Thor observes.) "I am a believer in the standardized approach in the CAP guidelines," Dr. Thor says, referring to an editorial by Clive Taylor, MD, PhD, in the July issue of Archives of Pathology & Laboratory Medicine (2000;124:945-951). "That was the first time anatomic pathology people were told about using consistent technique in a semi-official sense."

This year, the CAP’s Cell Markers and Molecular Pathology committees will jointly send fixed breast cancer tissue samples to IHC and molecular pathology laboratories, says Raymond Nagle, MD, PhD, professor of pathology and deputy director of the Arizona Cancer Center at the University of Arizona, Tucson. "This will be the first large multilaboratory comparison of IHC, FISH, and morphometric analysis," he says.

Dr. Nagle himself tests for HER2 with Ventana’s IHC kit, which is awaiting FDA clearance. (He adds a standard disclaimer to reports.) It is based on Ventana’s CB11 monoclonal antibody and, like HercepTest, has a very specific protocol. He finds that results seem to fall into two categories: 3+, where almost 100 percent of the cells are unequivocally positive; and background cytoplasmic staining or a trace of staining in a few cells. "I find there are very few tissues where you really ponder, Is this a 2+ or not?" says Dr. Nagle, who is a consultant to Ventana. He has a 3+ breast specimen control on each slide.

"Our clinicians treat on the basis of 3+," he adds. "I have had no requests to have testing done by FISH, either by patients or clinicians. We intend to set up FISH, but will use it only as a second-line test because of its expense." To validate the Ventana kit, an internal study was done of a series of cases that were ER/PR-negative with a high proliferative index, tumors that are typically strongly positive for HER2. "We haven’t done a study correlating it with FISH or with clinical response," Dr. Nagle says.

Dr. Hammond evaluated Ventana’s kit on her own, comparing it in a blinded fashion with HercepTest in 200 breast cancer cases that had been followed for 10 years. She found complete concordance among 3+ cases. In the 11 (5.5 percent) instances where her readings of the assays disagreed (all 2+/1+), she showed the slides to two other pathologists. "What we found," Dr. Hammond says, "was that the other two pathologists agreed with my readings on HercepTest, but we disagreed in our readings of the CB11 slides. Interobserver variation with CB11 could not easily be resolved." Dr. Hammond attributes this to high background cytoplasmic staining with CB11 that made it difficult to evaluate the cell membranes. "And," she adds, "we did it with their recommended method and on a Ventana machine."

Dr. Bloom did the trial for Ventana that provided the data submitted to the FDA. "We followed exactly the protocol that Ventana set up," Dr. Bloom says. Antibody was pretitrated to cell lines with known numbers of HER2 surface molecules, just like the initial assay Genentech used to qualify patients for Herceptin trials. Dr. Bloom’s results were similar to those generated by Dr. Hammond: Seven (4.7 percent) out of 150 samples were discrepant, all with 1+/2+ scores.

"Anecdotally we have seen a fair number of blocks that were 3+ on immunostaining with CB11 but 0 or 1+ with HercepTest," Dr. Bloom says. These samples were not amplified on FISH. "This is not because CB11 is a bad antibody. But when a laboratory is not adhering to an established protocol, it may lead to false results."

Some laboratories’ difficulties interpreting IHC results have led one company, ChromaVision, to market a cellular image analysis system, called ACIS, to read HercepTest slides. Underlying this device is the assumption that quantitatively judging the amount of dye taken up by cells on a slide is something that people are inherently very bad at doing. "We humans have great accommodation to variations in brightness," says Jose de la Torre-Bueno, PhD, ChromaVision vice president of R&D. "We can see in anything from starlight to bright sunlight." But a computerized instrument is much better at judging absolute intensity. ACIS uses a bright-field microscope with no filters, can run unattended, and reads slides and presents images for later review.

Douglas Harrington, MD, chairman of the board and CEO at ChromaVision, notes that HER2 is the first in a series of tests that will require anatomic pathologists to give quantitative rather than yes or no answers. "Pathologists will need some sort of imaging assistance to produce results adequate to guide therapy," Dr. Harrington believes.

In one series of 129 cases read by nine pathologists with and without ACIS, the imaging system significantly increased concordance between IHC and FISH, mostly by improving scores among pathologists inexperienced at reading HercepTest. Eight of nine pathologists had concordance values >90 percent when using ACIS, including four whose concordance level without ACIS was <65 percent. Taking the pathologists as a group, the fraction of samples read as 2+ and 3+ that were nonamplified dropped from 54 percent without ACIS to 22 percent with the system. Most samples moved from 2+ to 1+. (Results of this study will be presented in December at the San Antonio Breast Cancer Symposium.)

In several cases read positive by IHC/ACIS but nonamplified by FISH, it turned out that FISH results were scored in the wrong region of the slide. "Pathologists are used to using dyes like H&E to classify areas in a specimen," notes Kenneth Bauer, PhD, chief science officer at ChromaVision. "FISH substitutes fluorescent dyes that tend to fade rather quickly under the microscope. And the great majority of pathologists are not used to recognizing specimens of invasive cancer versus non-cancer with fluorescent dyes."

ACIS is sold on a per-use basis. Reimbursement for this technology is "very favorable," Dr. Harrington says. "Pathologists can make incremental revenue based on existing CPT codes." Randy Judd, MD, director of Special Technologies at AmeriPath’s Center for Advanced Dia1gnostics, Orlando, Fla., is using the ACISnow. Dr. Judd had developed an in-house assay for HER2 testing using the Dako polyclonal antibody and Ventana’s automated immunostainer. When the HercepTest was released, he used controls from the Dako kit to adjust the antibody titer to produce staining equivalent to the HercepTest. "However, once we began using the ACIS, we discovered that the nonspecific cytoplasmic staining that can be induced by antigen retrieval was being falsely interpreted as a positive result by ACIS," Dr. Judd explains. "We eventually eliminated this problem by adding an avidin-biotin blocking step to our procedure."

Since bringing ACIS into his laboratory, Dr. Judd says, "We have been very happy with the consistency of our results. The automated Ventana stainer minimizes variability caused by technical factors, and the ACIS minimizes variability due to pathologist subjectivity."

In collaboration with Jane Gibson, PhD, director of molecular genetics at the Center for Advanced Diagnostics, Dr. Judd is evaluating the correlation of ACISHER2 scores with FISH. "We are finding that the problem of false-positive and false-negative IHC results is restricted to a fairly narrow range of ACIS scores," Dr. Judd reports. "These are the cases that benefit most from followup FISH testing."

As for reimbursement, Dr. Judd says, "Although the ACISfee is high, reimbursement revenue has been more than adequate to cover our costs." Dr. Judd and his colleagues eventually plan to narrow the range of cases requiring ACISquantitation to only those with intermediate (1+ to 2+) staining. "Our goal is to develop an algorithm that maximizes the benefit of these new technologies at a reasonable cost," Dr. Judd says.

Dr. Bloom, who took part in the ChromaVision study, says, "Image analysis levels the playing field for reading IHC slides. It puts novices on a par with experts. With ACIS, even untrained pathologists can achieve greater than 90 percent concordance with FISH." Pathologists experienced in reading HercepTest also get some benefit from the addition of ACIS, he adds.

Important to note is that the study used preprocessed slides. "Using ACIS doesn’t change the fact that you have to get tissue handling right first," Dr. Bloom emphasizes. "It won’t help you with a poor immunostain." The results of Dr. Roche’s independent evaluation of the ACIS at the Mayo Clinic are under discussion with ChromaVision.

ChromaVision’s Dr. Bauer referred to the importance of analyzing for HER2 in an area of invasive cancer, which demands special care when fluorescent microscopy is used. In practice this can be a problem. As a central testing laboratory for clinical trials, Dr. Thor sometimes gets discrepant results from a sending laboratory. She may call and ask them how they did the test. "Some laboratories have told me they do not try to separate invasive from in situ cancer," she says. "When it comes to breast cancer, this is quite important."

Many studies have looked separately at the in situ and invasive components of breast cancer. Dr. Thor says the vast majority of in situ disease picked up on mammography is the large cell comedo subtype and about two-thirds have altered HER2 status. The same is true for the in situ component of invasive disease detected by mammography: About two-thirds have HER2 alterations. In contrast, only about one-third of invasive disease has altered HER2 status. Because such a high percentage of in situ lesions are abnormal for HER2, this marker is not a useful predictive or prognostic factor. "CAP guidelines stipulate that HER2 must be scored on the invasive component of breast cancer," Dr. Thor notes. It can be difficult to score histology on FISH. "To look at tissue under the fluorescence microscope is hard," Dr. Thor says. "For small tumors especially, differentiating in situ from invasive becomes a tricky business.

"Which is why," Dr. Thor continues, "it is important for pathologists to be very careful in the slide or block that they select for HER2 testing." In these days of small lesions, this is especially true. To do multiple stains on one block of a 3-mm cancer confirmed on frozen section takes thought and conservation. "For a small tumor, especially for cores," Dr. Thor suggests, "what you can do, instead of tossing most of the sections, is to cut very carefully on the facing, then float 10 slices on a ribbon. Pick up every slice and put them onto slides for staining or FISH." In some places a block is cut for diagnosis, then stored. It may be remounted and recut for additional tests, but there is not always enough tissue. Making a larger number of slides at the initial sectioning can prevent this problem. Dr. Bloom, too, underscores the importance of carefully matching HER2 analysis with tissue histology, especially on FISH analysis. "We recently outright missed one case on a liver biopsy," he relates. One section level had tumor and liver tissue, which was circled. On the next level down, which was used for FISH, all the cells looked big, and they were thought to be tumor cells under fluorescence microscopy. In fact, tumor had disappeared and there were only liver cells. "When we recognized the discordance with IHC, we looked again, and saw that there were no tumor cells on that level," Dr. Bloom says.

Dr. Judd cites a case in which an invasive tumor was HER2-negative by IHC while a microscopic focus of DCIS was positive. "Since the invasive portion is most relevant clinically, we scored the case as HER2-negative," he says. However, the FISH laboratory picked up on this small area of amplification and scored the case as positive. Drs. Judd and Gibson reviewed the slides together and concluded this represented a false-positive FISH result. The clinical significance of HER2-positive DCIS associated with an invasive tumor that is HER2-negative remains unknown.

So what is the bottom line in HER2 testing? "I don’t believe we have the data to recommend a single method," Dr. Thor says. "I think the answer will come from treatment trials. The only important factor is how do our assay data compare to Herceptin response." To generate such data, Dr. Thor is performing several HER2 assay methods in an Eastern Cooperative Oncology Group trial of Herceptin therapy. Two other cooperative trials groups, NSABP and CALGB, are conducting similar studies. But it will be two to three years before data emerge from these trials.

Dr. Roche is measuring HER2 by IHC and FISH in two Herceptin clinical trials. "These studies are really going to be the telling tale as to which test is the better predictor of outcome to therapy," he says.

Dr. Bloom is now collating clinical outcomes of 150 or so cases treated with Herceptin as a single agent on which both IHC and FISH were done. But he is also doing markers other than HER2. "Even with FISH, HER2 measurement is not very predictive of response," he says. "We need to do better."

Dr. Thor agrees. Current Herceptin-only response rates based on HER2 IHC testing are 20 to 30 percent. "Other assay cut points, downstream markers, or heterodimer data in addition to HER2 by immunohistochemistry may better predict Herceptin response," she says. "I think we’re not likely to continue to accept response rates that low, because we are now seeing some patients dying while being treated with Herceptin-not many, but some." In the future, Herceptin treatment might be limited to patients who have a better than 50 percent chance of responding.

For now, Dr. Tubbs says, "First there has to be good communication between pathologists and oncologists. Second, all IHC 2+ samples should be confirmed with FISH at a minimum. Even this approach will miss some false negatives and false positives. Third, we clearly need to evolve a chromogenic in situ hybridization assay that can be evaluated with conventional optical microscopy and that does not require copy-by-copy signal enumeration." He and his colleagues are now working to develop this system.

Getting it right with HER2 has long-term consequences. "New antibodies will be coming out that must be read quantitatively and that will present the same problems all over again," Dr. Bloom warns. "This is the wave of the future-therapeutic monoclonal antibodies followed by quantitative laboratory tests to select patients. So we have to get into the habit of handling tissue appropriately." "This level of specificity [with HER2] is frankly quite different than what we are used to in anatomic pathology and an example of what is going to be repeated many times in the near future," Dr. Stoler agrees. "Every time someone comes up with a new drug based on a genetic marker, we will have to go through how best to test for that analyte. It will require a lot of work to sort out." He adds, "Anatomic pathologists have never before been forced to have this kind of precision."

To Dr. Tubbs, the issue’s importance must be viewed in the context of breast cancer’s prevalence. A new diagnosis of invasive breast carcinoma was made for 175,000 women in the United States last year, the American Cancer Society estimates. An IHC false-positive rate of 10 to 12 percent "may not appear to be significant at first glance," he notes, "but it assumes greater importance when one considers that thousands of IHC assay results may be spurious."

Dr. Stoler, referring to the collaborative study of IHC, FISH, and mRNA that he did with Drs. Tubbs and Roche, says, "One of the things that motivated us is that we don’t think it was done right the first time. We believe that systems and models should be developed for getting it right before a test is approved by FDA."

Dr. Thor is clear about who should devise and implement these systems. "Next time we need to do these pathology studies up front during clinical trials instead of letting industry create controversy," she says. "It is not companies that should be doing these tests; it is pathologists and laboratories."

William Check is a freelance medical writer in Wilmette, Ill.