Making the call on HER2 testing methods
Serum HER2 testing
February 2002 William Check, PhD
Pathologists and oncologists are debating which is the better way to qualify
patients with metastatic breast cancer for treatment with trastuzumab (Herceptin),
a monoclonal antibody directed against the HER2 receptor: by measuring HER2 protein
overexpression on cell membranes using immunohistochemistry, or by measuring amplification
of the HER2 gene using fluorescence in situ hybridization. With the FDA’s Jan.
10 approval of Abbott/Vysis’ PathVysion FISH assay as an alternative method of
selecting patients for Herceptin therapy, the debate is sure to heat up, particularly
in the commercial arena. While deciphering the relative merits of these two assays
is important, the controversy is overshadowing several important facts:
Improving the accuracy of selection methods, particularly IHC,
gains added urgency with the imminent arrival of additional anticancer
agents for which eligibility will be determined by characterization
of patients’ tumors. "This is a new paradigm for drug treatment,"
says Elizabeth Hammond, MD, chair of pathology at LDS Hospital,
Intermountain Healthcare, and professor of pathology at University
of Utah School of Medicine. "Herceptin represents the best example
of a drug whose application is totally dependent on determining
that a patient has protein overexpression." Many similar agents
are now in the pipeline.
Mark Pegram, MD, associate professor of medicine, Division of Hematology/Oncology,
and director of the Women’s Cancer Program, UCLA/Jonsson Comprehensive Cancer
Center, agrees. "The next anti-neoplastic reagents will be targeted to proteins,
not ampified genes. For those agents, we will have to rely on IHC, so we will
have to make it work."
Herceptin was approved in September 1998 to treat metastatic
breast cancer in patients who overexpress HER2 protein. At that
time, FDA linked trastuzumab approval to concomitant approval
of a diagnostic IHC assay. The IHC assay used to qualify patients
in the clinical trials—called the Clinical Trials Assay
(CTA)—was considered "impractical for commercialization
and widespread use," according to Genentech. "The CTA was incredibly
laborious—it took 18 hours and 35 steps," says Kenneth Bloom,
MD, director of immunohistochemistry and director of the Breast
Pathology Service, Rush-Presbyterian-St. Luke’s Medical Center,
Chicago.
Accordingly, HercepTest was developed in a collaboration between
Genentech and Dako to mirror the CTA and to have defined conditions
of fixation and antigen retrieval. As with the CTA, only patients
scoring 2+/3+ on HercepTest are eligible for treatment with Herceptin.
Since then HercepTest has become the assay routinely used in the
clinical setting to test breast cancer tissues for HER2, but much
investigation and off-label testing has been done with the Vysis
FISH assay. In November 2001 Genentech asked the FDA to add PathVysion
to the Herceptin package insert. FDA used the Dec. 6 hearing to
review this area more widely.
"FDA wanted to see what had happened to HER2 testing in the
last few years, especially clinical trials where the benefit of
Herceptin was evaluated," Dr. Hammond says. She calls the meeting
"report-card time—how well are current systems working?"
Because of this wider mandate, much old and new data were presented
at the hearing (www.fda.gov/ohrms/dockets/ac/01/briefing/3815b1_08_HER2%20FISH.doc).
One of the earliest comparisons between the two methods was
a retrospective analysis by FISH on 623 slides from patients previously
screened for the Herceptin clinical trials. (LabCorp did both
initial CTA screening and the subsequent FISH assays.) A forced
1:1 ratio of positive/negative slides was selected (positive,
2+/3+ on the CTA; negative, 0/1+). In 15 percent of slides, FISH
did not provide an informative result. On the remaining 529 slides,
the concordance rate between FISH and CTA results, extrapolated
to the composition of the overall trial population (32 percent
IHC 2+/3+), was 88 percent. Notably, 11 percent of 3+ cases and
76 percent of 2+ cases were not amplified and about 3.5 percent
of 0/1+ cases were amplified, reinforcing the notion of false-positive
and false-negative results on IHC.
However, a number of problems arise in interpreting these data.
Allen M. Gown, MD, director and chief pathologist, PhenoPath Laboratories,
Seattle, calls this result "not very surprising.
"The CTA had a poor correlation, only 82 percent, with HercepTest,"
he says. "If you compare FISH to IHC but use poor IHC, you will
have better results with FISH. But that doesn’t condemn IHC overall."
Patrick Roche, PhD, associate professor of pathology and director
of the immunohistochemistry laboratory at the Mayo Clinic, agrees.
"Those comparisons were made between FISH and an IHC assay that
was not considered robust enough to be approved for clinical use,"
he says. "There has been no valid comparison between FISH and
the two approved IHC clinical assays [HercepTest and Ventana Medical
Systems’ Pathway], other than two ongoing adjuvant Herceptin trials,
NSABP B31 and Breast Intergroup N9831."
Interpreting this retrospective comparison is hindered also
by a lack of essential clinical data. Many who work with HER2
assume that IHC-negative, FISH-positive patients would respond
to Herceptin and that IHC-positive, FISH-negative patients would
not. However, there are no prospective clinical outcomes data
bearing on this point. "That was a major concern of the [FDA]
committee members," says Dr. Hammond. "There is a small but significant
proportion of patients who have protein overexpression and no
gene amplification. There are no good outcomes data to tell us
what happens to those patients on treatment."
In addition to making clear its concern about outcomes among
IHC 3+, FISH-negative patients, the FDA review committee concluded
that current data "[D]o not provide information on the clinical
outcome for patients whose tumors score IHC 0 or 1+ and FISH (+)."
Retrospective comparisons are "useful for hypothesis generation
for future studies," they wrote.
What is needed, says Debu Tripathy, MD, associate professor of medicine,
UCSF-Mt. Zion Cancer Center, and attending at Carol Franc Buck Breast Care
Center, is to be able to identify better who is truly HER2 positive, who will
respond to Herceptin. "Are IHC-negative, FISH-positive specimens a true phenotype?"
he asks. "Or has fixation gotten rid of the protein epitope?" DNA is known
to be more stable to fixation and processing than protein. "More trials now
are requiring central testing of tissue by FISH or IHC or both," Dr. Tripathy
says. He is optimistic that "all of these issues will be resolved over time."
Outcomes data are now available from two single-agent
Herceptin trials (0649, 0650), as well as from a randomized trial
of chemotherapy versus chemotherapy plus Herceptin (0648). "All
support that FISH is useful for selecting patients who will benefit
from Herceptin therapy," according to Dr. Pegram. "The data are
compelling—FISH has the potential to become the preferred
clinical assay for detection of the HER2 gene alteration," he
says. "As cost comes down due to economy of scale, it could even
replace immunohistochemistry for assessment of HER2 status."
Two endpoints have been used in these clinical trials: time
to progression (TTP) and response rate. In 0648, TTP was significantly
longer for all FISH-positive patients in the Herceptin-plus-chemotherapy
arm than for those treated with chemotherapy alone, with a relative
risk of 0.44. And the response rate for all FISH-positive patients
was significantly higher in the Herceptin-plus-chemotherapy arm,
54 percent, than in the chemotherapy arm, 30 percent. No advantage
was seen in FISH-negative patients. Positivity on FISH clearly
selects patients who will benefit from the addition of Herceptin
to standard chemotherapy.
However, data on all IHC 3+ patients (either FISH-positive or
FISH-negative) are equally impressive. A significant benefit in
TTP was seen among 3+ patients for whom Herceptin was added to
chemotherapy, with a nearly identical relative risk to that seen
in FISH-positive patients, 0.42. And an increased response rate
was seen among 3+ patients for whom Herceptin was added to chemotherapy,
55 percent versus 31 percent, again nearly identical to that seen
among FISH-positive patients. The advantage among 3+ patients
was not gained by treating fewer patients: Responses occurred
in 89/164 FISH-positive patients treated with Herceptin plus chemotherapy,
and in 94/169 IHC 3+ patients.
In the 0649 trial, 20 percent of FISH-positive patients responded
to Herceptin single-agent therapy, compared with no FISH-negative
patients. But the response rate among all IHC 3+ patients was
almost identical, 19 percent. Again, total numbers of patients
responding were similar: 33/163 FISH-positive patients and 30/157
IHC 3+ patients. (Three responders were IHC 2+, FISH-positive.)
Finally, in 0650, the response rate among FISH-positive patients
was 34 percent, while among IHC 3+ patients it was 35 percent.
Two responders were FISH-negative, IHC 3+.
Based on these clinical data, the FDA reviewers concluded that,
because of the studies’ design, "[C]omparative claims of equivalence
cannot be made, and claims of superiority cannot be made." However,
they noted, "The general magnitude of the beneficial effects of
trastuzumab therapy in the FISH (+) subgroup is similar to that
in the IHC 3+ subgroup."
Dr. Pegram presented phase II data from a fourth clinical trial at a recent
oncology meeting in Lisbon. All patients were treated with taxotere, platinum
salt, and Herceptin. Patients in the FISH-positive subset had a significantly
longer TTP, 17 months, than the FISH-negative subset, 7.4 months. Responses
were seen in 64 percent of FISH-positive patients and 41 percent of FISH-negative
patients. "These data support the use of FISH to discriminate patients who
are most likely to benefit from Herceptin-based therapy," Dr. Pegram says.
He adds, "The study was not intended to be a comparison of FISH and IHC."
In addition to direct comparison and clinical outcomes,
interlaboratory reproducibility is another way to assess assays.
Three relevant datasets of this type were presented at the FDA
hearing; a fourth dataset is under publication review.
In the NSABP B31 clinical trial, patients were tested by IHC
in a local laboratory, with positive samples (e.g. 3+ on HercepTest)
sent to a central laboratory (LabCorp) for review by both HercepTest
and FISH. No patients had been qualified by FISH testing at that
time. Data were presented on 104 positive specimens. If the local
laboratory was a reference laboratory and used HercepTest, the
central laboratory finding agreed in 96 percent (27/28) of cases.
(A reference laboratory was defined as one with an average of
100 HER2 cases per month for six months.) If the local laboratory
was a nonreference laboratory using HercepTest, the concordance
rate was 81 percent (42/52). If the local laboratory was a nonreference
laboratory using other antibodies, the concordance rate fell to
52-65 percent.
Dr. Roche presented concordance data for IHC and FISH from the
Breast Intergroup N9831 study. "We are seeing 75 percent concordance
between samples testing IHC 3+ in a local laboratory and central
testing," he says. For FISH, the local-central concordance rate
was even lower, 67 percent.
"What that means," Dr. Roche says, "is that both of these are
complex tests. It is not any easier, as it has been advertised,
to do FISH in the community or locally than it is to do IHC."
In his view, the difficulties with performing FISH in local
laboratories are the same as with IHC—volume and feedback.
"You need expertise to do both tests," he says, "and for that
you need experience and feedback. I don’t think you can do either
test in isolation and know that you are doing it correctly."
Dr. Hammond calls the results of these studies "pretty disheartening"
and "similarly dismal" for both IHC and FISH. "Obviously, this
variation in testing makes earlier results suspect," she says.
With regard to IHC, Dr. Hammond says, "What is happening is
that laboratories around the country are running the test as if
they were doing keratin stains to tell whether a tumor is epithelial,
rather than a quantitative test to determine therapy. They don’t
realize that a much higher degree of rigor is required to make
that determination."
To Dr. Bloom, the poor correlation with IHC, particularly among
laboratories not doing HercepTest, is no mystery. "The first thing
pathologists did [when HercepTest was approved] was to ignore
the kit and the protocols in the kit and to do their own thing,
as pathologists do," he says. Some used microwaving instead of
a water bath for antigen retrieval; others used different fixatives
in place of formalin. "Then they wrote articles saying that if
you change things, the kit performs differently."
Dr. Bloom sees the same problem with precision that Dr. Hammond
sees. "Typically, we make up a scoring system of 0 to 3 and don’t
care much about a one-grade difference," he says. "But now a therapeutic
decision is being made on that one-grade difference between 2
and 3. We have never before had to adhere to such rigid standards
in an IHC test."
He does see a positive side to the NSABP data. "One of the big
hits against IHC is that you can’t control how the tissue is handled,
fixed, and stored," he says. But in this series reference laboratories
had no problems with IHC. "And the one thing reference laboratories
can’t control is how tissue is processed," he notes. "What this
suggests to me is that, if you do the test in a standardized way,
the way it is supposed to be done, and are skilled in the assay,
you get a reproducible result for 3+ staining. Whereas hospital
laboratories either have problems with interpretation or subtle
differences in staining."
Not everyone accepts the results of these studies. Dr. Pegram questions
the FISH discordance data from the Breast Intergroup study. "I have not seen
those data on FISH in noncommercial laboratories," he says. "But I have seen
a lot of FISH assay slides, and they are not difficult to read. There may
be some people in some laboratories who do not have that kind of competence,
but I find it difficult to believe that [the discordance rate] is as high
as was reported."
Also reported at the Dec. 6 FDA hearing was a third reproducibility
dataset, a validation study in which 250 slides from the Herceptin
clinical trials were selected for testing by FISH in two laboratories,
LabCorp and the laboratory of Michael Press, MD, PhD, professor
of pathology in the Norris Comprehensive Cancer Center at the
University of Southern California, who has longtime experience
performing FISH assays for HER2. Due to an 11 percent failure
rate with FISH in Dr. Press’ laboratory, 223 slides were available
with results in both laboratories. Concordance between the two
laboratories for FISH positivity was "poor," the FDA reviewers
said. Discordance was in a specific direction: Of samples testing
positive in Dr. Press’ laboratory, 32 percent (37/116) were negative
at LabCorp. Only two of 107 samples testing negative in Dr. Press’
laboratory tested positive at LabCorp.
Dr. Roche’s interpretation: "These results show that FISH is
not foolproof, even among experienced laboratories."
A fourth interlaboratory reproducibility dataset, contained
in a paper submitted for publication in the Archives of Pathology
& Laboratory Medicine, summarizes results of CAP Surveys from
the past two years. "During that time we shared specimens between
IHC and FISH Surveys," says Raymond Tubbs, DO, chairman of the
Department of Clinical Pathology, Cleveland Clinic, and a member
of the CAP Cell Markers and Molecular Pathology committees. In
the year 2000, 35 laboratories participated in the FISH Survey,
rising to 60 in 2001. About 350 laboratories participated in the
IHC survey. Results show "substantial variation in IHC results,"
Dr. Tubbs says, particularly for FISH-negative cases, those not
amplified on FISH testing. FISH concordance was 100 percent for
those same cases, he reports, at least for those laboratories
submitting a result. While 95 percent of laboratories submitted
a result for FISH-positive cases, only about 75 percent submitted
a result for the FISH-negative cases.
"Maybe those laboratories were not confident of their [negative]
results," Dr. Tubbs speculates. "It is much easier to get an amplified
case to stain appropriately. For nonamplified cases it is more
difficult to achieve correct staining."
Dr. Roche, also a member of the CAPCell Markers Committee, agrees
with this interpretation of the FISH data. "I think everyone is
sure when they get a positive result, but some may be unsure when
they get a negative result," he says. "In clinical practice, you
would have to make the call."
Dr. Roche also points out that the discordance for IHC was primarily
among 2+ cases. "In the 3+ category," he says, "concordance was
very good."
"These 2+ cases are driving everyone crazy," says Richard Cartun,
PhD, director of Immunopathology at Hartford (Conn.) Hospital,
even though they make up only a small percentage of cases. "We
have never really felt that 2+ should be grouped with 3+ as positive,"
Dr. Cartun says. "Early on we did differential PCR testing and
saw that many cases that were 2+ on IHC were negative for amplification."
In a recent paper comparing IHC and FISH (J Clin Oncol.
2001;19: 2714-2721), Drs. Tubbs, Roche, Mark Stoler, MD, and others
wrote: "We advocate a voluntary or FDA-mandated withdrawal of
the 2+ HercepTest score as a criterion for Herceptin therapy and
recommend an algorithm whereby all [IHC]-positive cases are confirmed
by FISH."
How would different selection criteria affect clinical practice? Based on
the CTA/FISH concordance study, approximately equal numbers of patients—20-22
per 100 screened—would be treated with Herceptin based on treatment
of all FISH-positive patients, or all IHC 3+-positive patients, or all patients
who are either IHC 3+ or IHC 2+/FISH-positive. Available clinical outcomes
data are not adequate to determine the efficacy of the three selection schemes.
In the absence of definitive data, practitioners express
a variety of attitudes about the relative merits of IHC and FISH.
Dr. Tubbs adopted FISH a few years ago because in his view it
is "the most reliable and non-ambiguous test." He adds, "Issues
of acceptance are really because the technology is not available
at all sites and interphase FISH is difficult and time-consuming
to set up if you don’t already have it in your laboratory."
Mark Stoler, MD, professor of pathology and associate director
of surgical and cytopathology at the University of Virginia, uses
IHC, and his clinicians are satisfied. However, he says, "I would
set up FISH tomorrow if money, time, space, and talent were not
limiting. But in the real world they are." He says he would need
to buy an automated instrument to perform the assay, because "this
is not something we can train our technologists to do
right now."
In Dr. Cartun’s experience, IHC is a fine test. "My current
feeling is that IHC, when performed correctly, is a very good
assay for screening tumors for HER2 protein overexpression," he
says. The big question, he adds, is how to define the 2+ category
more accurately. As for FISH, Dr. Cartun says, "FISH is more technically
demanding and more expensive, and not as many people have access
to it or are doing it. I think that current shortages of resources
and qualified technologists are going to affect hospitals’ ability
to bring on FISH testing."
Both FISH and IHC can be accurate if done well, in the opinion
of George Somlo, MD, associate director for high-dose therapeutics,
Department of Medical Oncology and Therapeutic Research, City
of Hope Cancer Center, Duarte, Calif. He sees FISH as the gold
standard for assessing breast cancers for HER2 gene amplification.
"If the price of the test would not be so high, probably everyone
would be using it," he says. "Although," he adds, "it requires
some expertise and not every pathology laboratory would be able
to perform it on a large scale." As with any procedure, quality
and accuracy depend on volume.
IHC can also be accurate, Dr. Somlo says. "But with IHC methodology,
no one knows which antibody is best and what is the best method
of recovering protein." However, he adds, "If IHC is performed
in a credible pathology laboratory, there is a fairly good correlation
between strong overexpression of protein [3+ staining] and gene
amplification." In his clinical practice, he uses IHC analysis.
For tumors that are strongly positive, 3+, they potentially initiate
antibody treatment. For borderline IHC expression, 2+, they proceed
with FISH analysis.
To compensate for the discordance rates observed between local
and central laboratories, NSABP and the Breast Intergroup are
following a course consistent with Dr. Somlo’s idea—depending
on credible labs. NSABP concluded that patients would be eligible
for study entry based on IHC testing only if a positive result
was validated by a reference laboratory; it listed 11 such laboratories.
In the Breast Intergroup study, patients are now randomized only
when HER2 positivity by either IHCor FISH in a local laboratory
is confirmed by central testing.
"It may sound self-serving," says PhenoPath’s Dr. Gown, "and
companies that market antibodies may be averse to this, but I
think maybe not everyone should be doing IHC for HER2." Qualified
laboratories could be chosen on the basis of volume and required
to validate their data in comparison to FISH.
Dr. Somlo echoes this sentiment for FISH. "If FISH became the
first line test," he says, "the issue would be getting smaller
laboratories to agree not to do it, more than larger laboratories
not having enough capacity."
A more feasible approach might be to improve performance of HER2 testing
generally. Says Dr. Hammond: "We have to follow the procedures we have used
for every other clinical laboratory test—have reference materials, generate
a standard procedure, train technologists, and do ongoing proficiency testing.
These are all things that CAP has endorsed and promoted through its Laboratory
Accreditation Program over the years."
As an immediate aid in reading IHC slides more reproducibly,
some pathologists are investigating semi-automated image analysis,
which reads intensity of stain on IHC slides as a continuous variable.
The ChromaVision ACIS is one such instrument. Randy Judd, MD,
director of special technologies at the AmeriPath Center for Advanced
Diagnostics, Orlando, Fla., has accumulated more than 1,000 cases
with both FISH and ACIS HER2 scores. "We see a strong correlation
between the ChromaVision ACIS score and the percentage of FISH-positive
cases," he says. However, he notes, ChromaVision does not give
information about the pattern of staining. "Is a complete membranous
pattern critical for predicting Herceptin response, or is the
intensity of the stain more important?" he asks.
Dr. Hammond has a ChromaVision image-analysis instrument in
her laboratory. "It is working very well," she says. "It reduces
subjectivity and makes the three of us [pathologists] very consistent."
In Dr. Hammond’s experience, image analysis doesn’t change the
number of 3+ cases, but it decreases the number of 2+ cases.
The instrument "dramatically increases the cost of HER2 testing—it
doubles or triples it," she says. "We told our oncologists it
would increase cost and consistency and asked them, Do you want
it? They said yes."
Dr. Bloom has published data showing that use of ChromaVision
image analysis improves the correlation between IHC and FISH readings
among inexperienced pathologists from as low as 42 percent to
over 90 percent. He notes that there are not yet any clinical
correlation data for image analysis, however.
Dr. Tubbs has limited experience with image analysis. "But my
gut reaction," he says, "is that it would be easier and more reliable
to set up FISH than thrashing around to set up an image-analysis
system." The ultimate solution, he predicts, will be a bright-field
in situ hybridization assay.
"GoldFISH"—which stands for gold-facilitated autometallographic
in situ hybridization-is such an assay that was devised by Dr.
Tubbs and is being developed by the Cleveland Clinic Foundation
in partnership with Nanoprobes. Manuscripts reporting the assay’s
technical validation and interobserver interpretive reproducibility
are under review. GoldFISH requires only a light microscope; it
does not require oil immersion or dot counting. "It is either
positive or negative, amplified or nonamplified," Dr. Tubbs says.
"We rarely see low-level amplification."
Says Dr. Stoler: "Fluorescence is difficult to do well and time-consuming.
And it is hard to correlate where the morphologically abnormal cells are."
He believes that an in situ hybridization assay with a non-fluorescent readout,
like the one Dr. Tubbs is developing, "would gain a lot more acceptance
by pathologists."
For now, even in the best of circumstances, only 30 to
35 percent of selected patients respond to Herceptin as a single
agent. What can be done to sharpen selection of patients who will
respond? Dr. Bloom points out that the probe used in the FISH
assay only detects HER2 gene amplification. "But it is not the
HER2 gene by itself that is amplified," he says. It is an amplicon
that includes HER2 plus a variable amount of the surrounding genome.
What else might be amplified that might be important to response?
One candidate gene is topoisomerase 2, which is sometimes amplified
along with HER2 and which makes a protein that controls responsiveness
to anthracycline therapy.
Another approach would be to look at downstream regulators that
influence HER2 protein overexpression. "Part of the diversity
of the HER2 family of receptors is that, when activated, they
don’t necessarily elicit the same signal," Dr. Bloom says. "It
depends on what ligands and co-receptors are present and what
other members of the HER receptor family they bind to."
In either case, Dr. Bloom says, "By focusing on IHC or FISH,
we may be missing a more complex picture."
Dr. Pegram agrees with this analysis but notes that there are
not now good clinical reagents to measure downstream markers of
HER2 activity. "Such future tests will be largely IHC-based and
might be very useful in expanding the repertoire available for
HER2," he says.
But improving the clinical value of HER2 assays is part of a
bigger problem, says Dr. Hammond. She is a member of a strategic
planning group—Program for the Assessment of Clinical Cancer
Tests—established by the NCI to wrestle with this question:
How can we take the thousands of promising markers that research
people find and turn them into clinically relevant tests? One
example is the oncoprotein p53.
"A few years ago everyone thought it was the hottest thing,
that it would tell who would die and who wouldn’t," Dr. Hammond
says. "But it never panned out to be a clinical marker, because
studies were poorly designed to test p53 utility as a clinical
marker and tests were not performed in a standardized manner so
that results from various studies could be aggregated to evaluate
benefit."
The need to resolve this issue gains urgency with the expected
release soon of two more biologicals like Herceptin for which
a qualifying assay will be mandatory. One is an antibody against
the HER1 receptor (also known as EGFR) and the other is an inhibitor
of a downstream tyrosine kinase. Regulation of EGFR expression
is complex, Dr. Tubbs says: Overexpression can occur in the complete
absence of gene amplification. "Compared to EGFR, the problems
we encountered with HER2 standardization will resemble a stroll
in the park," he says.
Says Dr. Hammond, "The floodgates are about to open."
William Check is a medical writer in Wilmette, Ill.
|