Steering the straits of transplant testing

title
 

cap today

 

 

June 2006
Cover Story

Karen Titus

Having a casual conversation about cardiac allograft rejection is next to impossible. Anyone broaching the topic soon becomes absorbed in knotty details, each of which seems to hinge on an “if.”

Ditto for any discourse about a test to monitor rejection, especially when gene expression microarrays are involved. Genes are plentiful. Study populations are small. Blood is tricky. And the heart is, well, the heart.

So the researchers and physicians who are helping to develop and test-drive a new gene expression test called AlloMap (XDx Inc., South San Francisco, Calif.) have plenty to say about the subject, guided by large doses of critical thinking as well as a look-and-hope optimism.

In their latest effort, they explored the test’s ability to predict acute cardiac allograft rejection, presenting data at the annual meeting and scientific sessions of the International Society for Heart & Lung Transplantation, held in Spain in April. In an earlier study, known as CARGO, physicians and researchers looked at whether AlloMap could detect the absence of moderate/severe cellular rejection in stable cardiac allograft recipients, possibly reducing the need for endomyocardial biopsies.

They found some answers, certainly, but mostly they remain surrounded by questions, the first being, How can the AlloMap test be used in clinical practice right now?

When the CARGO (Cardiac Allograft Rejection Gene Expression Observational) study was complete, a sigh of relief must have been uttered by those involved: The test appears to detect absence of moderate/severe cellular rejection. This, in turn, might reduce the need for biopsies in certain clinical settings, particularly at institutions whose protocols call for frequent biopsying. Though every center varies, it’s not atypical for heart transplant patients to undergo a biopsy weekly for the first three months, then monthly, until the end of the first year or two. Some centers, such as UCLA, stop biopsying after one year; others after two or three; yet others follow patients for a lifetime.

The study, published in the American Journal of Transplantation (Deng MC, et al. 2006;6:150–160), looked at gene expression profiling of peripheral blood mononuclear cells to discriminate ISHLT Grade 0 rejection (quiescence) from moderate/severe rejection (ISHLT e 3A/2R1). The paper describes the gene discovery and diagnostic development phases of the study as well as its validation. The researchers identified 252 candidate genes using known alloimmune pathways and leukocyte microarrays, then developed real-time PCR assays for each gene. Eventually they created an 11-gene RT-PCR test from a training set, validating it from an independent set.

Mandeep Mehra, MD, one of the authors of the study, calls the CARGO paper complex because he and his colleagues tried to address methodology as well as clinical relevance. “I don’t think we appealed to either contingent adequately,” says Dr. Mehra, the Herbert Berger Professor and head of cardiology, University of Maryland School of Medicine, Baltimore, and chief of cardiology, University of Maryland Medical Center.

The eight-center study collected samples from patients who were undergoing heart transplantation and followed by EMB and hemodynamics and/or echocardiography, among other clinical data. Those parameters are important, says Mario Deng, MD, the lead author on the CARGO study, because the test, in its current form, needs to be used with clinical examination and noninvasive graft function monitoring—for example echocardiograhy— if it is to be used effectively.

Based on published and post-publication data, Dr. Deng says, the test is useful for patients six months or more after transplantation who are undergoing routine biopsies. If these patients are clinically stable with no signs or symptoms of rejection (that is, if they have no shortness of breath and no weight gain secondary to fluid overload); if they have normal graft function by echocardiography; and if they have an AlloMap test score below threshold, then the presence of a Grade 2 revised rejection (formerly Grade 3A or higher— the ISHLT recently revised its guidelines) is approximately one percent. “Therefore, in that situation, a routine biopsy need not be done,” says Dr. Deng, director of Cardiac Transplantation Research, Department of Medicine, Division of Cardiology, Columbia University, New York. “These situations should be between 75 and 80 percent of all encounters, according to the CARGO database, which included more than 600 patients involved in more than 5,000 encounters.”

E. Rene Rodriguez, MD, director of cardiovascular pathology and director of the autopsy service at Cleveland Clinic, simplifies matters when he explains, “As a surveillance tool for somebody who’s not rejecting, it’s easy: It’s a good test to prove that there’s quiescence.”

But, he says, that’s not necessarily the obvious conclusion of the paper. “The paper may overemphasize a little the value [of the test] for rejection. And in reality, what we know now is that the test is terrific to detect no rejection.” The value of that application rests on several premises, one being that it would be good to do fewer endomyocardial biopsies.

Cardiologists like this idea. First, they don’t want to do any invasive test if it can be avoided. While EMBs are generally not seen as dangerous, and risk of infection is low, a biopsy is a biopsy. It involves time, money,2 and a very sharp needle. It’s uncomfortable for patients, and, in pediatric heart transplant cases, the youngest recipients require sedation.

EMB, like any biopsy, is not perfect, though it is the current gold standard for detecting cardiac allograft rejection. Whether physicians are aware of its limitations is debatable. In a separately published study (Marboe CC, et al. J Heart Lung Transplant. 2005;24: [suppl] S219–S226), CARGO researchers identified a subset of 827 biopsies, from 273 patients, that included all biopsies graded by local pathologists as Grade 1B or higher under the old ISHLT classification system as well as randomly chosen Grade 0 and 1A biopsies. Three study pathologists reviewed each case (without clinical data) and assigned their own ISHLT grades.

The greatest variability between the local and the study pathologists was in the diagnosis of ISHLT Grade 2 (17 percent agreement), with the latter group significantly less likely to make a diagnosis of Grade 2 rejection, the authors report. The study pathologists were significantly more likely to diagnose grades 0, 1A, and 3B rejection and significantly less likely to diagnose grades 1B and 3A rejection as well as Grade 2. Quilty lesions were a major contributing factor in the discrepancies. They were noted in 3.3 percent of local Grade 0 cases and in 31 percent and 37 percent of local Grade 2 and 3A cases, respectively. At best, says Dr. Rodriguez, who was not part of this study or the CARGO study, pathologists will disagree 30 percent of the time on the Quilty lesions versus rejection issue.

What does this have to do with the AlloMap test? It’s difficult to say right now, given that its current use is to rule out rejection. But researchers are an itchy lot, and many in the CARGO clan are looking eagerly at other applications that would bring the higher grade biopsies into sharper focus.

The new ISHLT guidelines may or may not help matters. Dr. Rodriguez says the classification was changed with great pressure from cardiologists, who found the old categories of 1A, 1B, and 2 useless. In their view, he says, all such cases represented mild rejection. Under the revised classifications, Grade 0 revised is the same as the former Grade 0. Grade 1R incorporates the former grades of 1A, 1B, and 2. Grade 2R is the former 3A, and 3R encompasses the old grades 3B and 4.

The relationship of these new grades to AlloMap scores “is a question for down the road,” Dr. Rodriguez says.

For now the focus is on the far ends of the spectrum—AlloMap scores in the low 30s or lower, and Grade 0 and Grade 3R biopsies.

It’s fairly easy to sort out the meaning of a low AlloMap score and a Grade 0 biopsy; slightly more difficult is figuring out the implications of a high AlloMap score and a higher grade biopsy. More difficult still are cases where the AlloMap score is high and the biopsy grade low.

A high threshold molecular score returned alongside a quiescent biopsy may occur early after transplantation, specifically within the first six months, says Dr. Deng, and it may indicate a developing rejection. If it occurs several years post-transplant, the numbers could reflect a state of quiescence on low-dose immunosuppression.

The other situation—a below-threshold molecular score paired with a nonquiescent biopsy—may be less common, according to Dr. Deng, who notes it occurred in only eight patients in the CARGO study and was probably related to EMB variability. On Grade 3A biopsies, the three central readers had a maximum concordance of 77 percent; they downgraded local reads of moderate or severe rejection by up to 50 percent. Thus, one explanation for the discrepancy between positive biopsy and negative score may be overreading locally. Or, if a 3A appearance is subendocardially located, a Quilty lesion might be overlooked if the pathologist does not do serial sections. Finally, says Dr. Deng, some Grade 3A rejections “are in a sense benign—they disappear without augmentation of immunosuppression.”

Physicians point to biopsy discrepancies as a reason to continue hunting for a better test. “The current gold standard at best detects 70 percent of rejection,” says Dr. Rodriguez.

Jon Kobashigawa, MD, one of the CARGO researchers, sees another reason to keep pressing. “One could argue that a test that truly is picking up a fingerprint of upregulation of the immune system for rejection may be the true gold standard for the rejection process,” says Dr. Kobashigawa, clinical professor of medicine at the David Geffen School of Medicine at UCLA and medical director of its heart transplant program.

But that’s the future. For now, even those most familiar with AlloMap are working out what the scores mean at their own centers.

“On a practical level at Columbia, we’re still in the early period of implementation,” Dr. Deng says. The institution began using AlloMap clinically this January, in conjunction with biopsy, and will slowly transition to doing fewer biopsies on patients who are six months post-transplant and clinically stable.

AlloMap scoring is time dependent. In the first six to 12 months post-transplantation, the quiescent threshold is lower than it is one year or later. The reason? Some of the parameters within the score are steroid-responsive. As steroid doses are tapered down in the first year, the quiescence threshold appears to change. “If a patient less than one year has a score between, say, 31 and 34, it potentially indicates an elevated risk of rejection. If a patient more than one year has the same score, it has much more likelihood of being in the quiescent range,” Dr. Deng says. Rising scores in long-term stable, quiescent, nonrejecting patients are “a research question right now. There may be inter-individual differences in gene expression profiles longitudinally.”

It’s too early to tell if discrepancies between AlloMap scores and biopsies will cause problems clinically. Says Dr. Mehra, “As with everything we’ve done in the field of heart transplantation for the last 30 years, we’re going to have to spend time to learn about this. The biggest mistake would be to sell this as a perfect tool. It isn’t.”

Adds Dr. Deng: “The largest misunderstanding that can happen with this test is to take it as a dichotomous score, that either tells you, Yes, there is relevant rejection, or No, there is not.”

The opposite, he says, is true. “I am afraid to say, this increases the requirement for intelligent interpretation of the test by the white coat taking care of the patient.

“You have to understand the concept of negative predictive value, and not everyone does,” he continues, before repeating the limiting mantra: patients six months after transplantation, without signs and symptoms of rejection, with normal graft function and a below-threshold AlloMap test. “In this combination, the patient at that time does not have rejection that requires augmentation of immunosuppression. All the other situations you have to be cautious and do the classical workup. Therefore, you really have to think this through. The test needs a competent interpretation that exactly voices this with every single test.”

Researchers are also trying to answer a very different question—can the AlloMap test predict future rejection?

“Initially the test was devised as a biopsy minimization procedure,” says Dr. Mehra. “However, the time point of greatest vulnerability for rejection is in the first six months after transplantation. So the biggest bang for the buck for a test like this may potentially lie at the time when we are performing the most rigorous surveillance for rejection, in the first six months.”

The frequency of cellular rejection in the first year is between 25 to 35 percent, Dr. Mehra says. Those rates are per patient rejection rates, not per biopsy rejection rates, in a given year, he adds. “If you look at the frequency of biopsies that are performed, then the frequency of rejection being picked up on a biopsy reduces down dramatically, to three or four percent.”

The problem, he continues, “is of significant magnitude. If about one in three patients after a transplant is going to suffer at least one episode of rejection, and you have absolutely no idea when that episode is going to occur, you’re left with no option but to apply the same rules to everyone—that is, biopsy everyone in the same way. There is absolutely no opportunity for personalized adjustment of surveillance. That’s essentially the value that AlloMap potentially adds.”

One primary finding of the study presented at the recent ISHLT meeting, Dr. Mehra reports, is the gene expression score discriminated those patients who went on to reject compared to those who did not. The negative predictive value of a gene expression score of less than 20 was 99.6 percent. The test was most robust in the first 180 days, he says, “exactly the time point when we would want it to be the most statistically strong.”

Of the 11 informative genes included in the AlloMap score, three showed very strong signals, indicating they might be the prime movers and shakers during this 180-day period. Two of them are known to be corticosteroid responsive. “What we found from this study is probably the first time that two such genes could help differentiate the patients who reject from those who don’t,” Dr. Mehra says. While controlling for steroid dose longitudinally, he and his colleagues showed that patients who rejected had fundamentally different steroid-responsive gene expression scores than those who did not. “We may have for the first time hit upon a way of measuring adequacy of steroid dosing in heart transplantation.”

Based on this data, Dr. Mehra suggests several clinical possibilities. One, a patient with a Grade 0 biopsy and an AlloMap score of less than 20 one month after transplant appears to have virtually no chance of rejection in the next three months; for these patients, additional biopsies during this time period may be unnecessary. For patients with a Grade 0 biopsy and an AlloMap score above 30, some 58 percent will reject in the next 12 weeks. Moreover, for patients with no chance of rejecting in the next three months, their current dose of steroids appears to be stable and adequate. On the other hand, clinicians may need to be cautious about moderating steroid doses in patients who are not rejecting but who have high AlloMap scores, “because they are potentially showing evidence that the dose of steroids is not adequate.”

Dr. Mehra looks even further ahead, professing he’s most excited by the new information on the individual genes. “Perhaps we will be able to dissect out two or three steroid-responsive genes and develop a steroid metric score, something that would be applicable not just to heart transplantation, but also to any other disease state in which corticosteroids are used.”

Pediatric heart transplantation patients may also eventually benefit from the AlloMap test, although Dr. Deng urges caution. “Although we assume it has a very similar effect, we cannot at this time recommend AlloMap testing for a patient population under age 15,” says Dr. Deng. He and others have investigated its use in patients under age 15, but the data are not ready “for a full peer-review publication, which I consider the minimum to start with a first statement on clinical implementation.”

For someone so heavily involved in AlloMap’s development, Dr. Deng is a refreshingly unadventurous proponent of the test. “Clinically implementing such a new molecular test requires responsible behavior. If there is no peer-reviewed publication, there is no scientific evidence. Abstracts don’t count. And if we implement clinical recommendations, they also must be based exactly on this evidence, and not go beyond. And specifically, we must not succumb to commercial interests.”

Published data are hardly the final word, however, as an emerging debate regarding gene expression studies makes clear. An editorial accompanied the CARGO study, sporting the somewhat inflammatory title of “Lies, Damn Lies, and Statistics: The Perils of the P Value” (Halloran PF, et al. Am J Transplant. 2006;6:10–11). Listening to discussion between the editorial’s authors and the CARGO researchers is a bit like following the conversational quadrille in a Jane Austen novel. The characters disagree, but with such unimpeachably good manners, it’s hard to know for sure.

“We admire the energy and commitment and resources that went into the CARGO study,” says Phil Halloran, MD, PhD, editor of the Journal and professor, Department of Medicine, University of Alberta, Edmonton. “Blood is a very vexing medium to work with. So it’s courageous to try to establish a successful blood test based on gene transcripts. And if this test is confirmed, and their findings are validated by independent groups, then it’s really quite a success story. Above all, we don’t want to disparage what they have done.”

But, he argues, the possibility of chance associations between transcripts and clinical phenotypes is frighteningly high. “Before you start using something to make a clinical decision, you should be watching the literature to see that an association of complex data with a clinical phenotype is hypothesis-based and that it’s been independently validated by a completely different group, in a completely different set of patients.”

Citing a Lancet article and editorial published last year (Michiels S, et al. 2005;365:488–492; Ioannidis JP. 2005;365:1686), he notes that of the seven largest microarray cancer studies, five produced results that occurred by chance. Bioinformatics techniques based on training sets and test sets “are generating a literature that can’t be reproduced,” Dr. Halloran says.

“This causes editors all sorts of worry,” he says. Readers need to realize that determining associations between data-intensive microarray readouts and clinical phenotypes is a mind-boggling proposition. Microarray studies should be looked at with the same intensity and skepticism that physicians have brought to other complex problems, such as the association of HLA alleles with disease phenotypes. “Or cluster designation antigens,” he offers. “It turns out there were hundreds of proteins on lymphocytes that you could pick up with monoclonal antibodies. That was another example of a large-scale collaborative effort, which didn’t result in contaminating the literature with tens of thousands of papers contradicting one another, but which actually reached an answer. Some types of very complex questions simply need to be asked on a larger scale, where there’s internal cross-validation between groups.”

Dr. Mehra is equally courteous in answering the criticisms presented by Dr. Halloran and his co-authors. “I personally support that editorial. We tend to glamorize genomics research, and that editorial raised the bar.”

In the end, the Journal did publish the study, of course. Drs. Deng and Mehra and Howard Eisen, MD, chief of cardiology at Drexel University and co-principal investigator on the original CARGO publication, responded with a letter to the editor (2006;6:1086– 1087), about which Dr. Halloran politely says, “We don’t agree that they fully addressed our concerns. But the dialogue goes on, and people in good faith reach different points of view. We all want there to be a successful test, and if this is the successful test, time will tell.”

In its current form, AlloMap is a send-out test—for which laboratories should probably be grateful. When Judy Wilber, PhD, vice president of technical operations at XDx, explains all the steps her laboratory takes to process samples, one is left with the feeling, “Who would want to do this test in-house?” “It’s very complicated,” she says, describing the RT-PCR procedures, each done in triplicate, for 20 different genes. It takes about 7.5 hours of hands-on work to run a sample. Not surprisingly, there are no plans right now to develop a kit format. Moreover, clinicians aren’t convinced they need one. They report receiving results between 48 and 72 hours after samples are sent out, which is adequate for the test’s current, non-acute use. “I would have expected turnaround time to be an issue, but it’s the least of our problems,” says Dr. Deng.

Dr. Mehra agrees. “If the idea is of prediction in a stable population, it would not bother me if the test result came back after even five days,” he says.

What users would like to see from AlloMap, instead, is another test, or maybe several tests, each perhaps using a different complement of genes. “I joke about this with the XDx folks,” says Dr. Rodriguez. “I tease them a little and say, ‘Hey, when are you going to put out AlloMap version 2, or 2.5, or 3?’”

“It is a strong test, don’t get me wrong,” he adds. “But I don’t think this is a test that walks on water. It’s a good test that needs to evolve to become a terrific test.”

The AlloMap, as mentioned, looks at cellular response. But that’s one of only several types of rejection that can threaten a transplant. Patients may also develop antibody-mediated rejection (also known as humoral rejection), vasculopathy, chronic rejection, or graft dysfunction of another etiology.

All are shrouded in their own fogs. The new ISHLT guidelines standardized the description of antibody-mediated rejection. “So now we have two things—cellular and antibody-mediated rejection—that we can start disagreeing on,” says Dr. Mehra with a laugh. Up until now, rejection trials have focused on cellular rejection, because antibody-mediated rejection is such a poorly characterized entity, he says. “And data on its frequency are not clearly available.”

“I have always felt that rejection, when it does occur, is always a combination,” says UCLA’s Dr. Kobashigawa. “In some cases, it’s more cellular than humoral; in some cases, perhaps fewer cases, it’s more humoral than cellular. So I think it’s important to try to delineate whether the current AlloMap test can pick up humoral rejection.” UCLA is looking at just that, he says, but “the results are not complete. So I can’t even hint at commenting on those findings.”

At Cleveland Clinic, pathologists do EMBs using frozen sections, which allows pathologists to quickly address possible antibody-mediated rejection through special stains. By protocol, Dr. Rodriguez and his colleagues stain for AMR only on the first biopsy after transplant, to establish a baseline, unless the patient has a clinical problem. That could soon change.

“We have been discussing increasing the number of antibody-mediated rejection surveillance stains, because we have had a few surprises,” he says. That reflects a shift in the heart transplant community and literature in recent years. Common wisdom held that antibody-mediated rejection is an acute event, happening within the first three months. “The truth is, it can happen late—sometimes very late. Most of the patients we are seeing, if they have antibody-mediated rejection, it’s two or three years, or even a decade later” (Rodriguez ER, et al. Am J Transplant. 2005;5:2778–2785).

An even bigger problem is cardiac allograft vasculopathy, a transplant-induced coronary artery narrowing. “The patients we lose every year, if we don’t lose them to a rejection that really got out of hand, which is rare, most of the time we will lose them secondary to vasculopathy, or maybe to cancer,” says Dr. Rodriguez. “Clearly we need to be able to stop allograft vasculopathy or detect it earlier, and there are no good methods of detection.” The best method—intravascular ultrasound—is invasive and expensive and, for now, “very much a research tool,” he says. “So it would be great to have a molecular marker.”

XDx is listening, and CARGO II is already underway. Dr. Wilber reports the company is conducting an international trial to assess the current test’s ability to detect vasculopathy, humoral rejection, and immunosuppressant weaning. Plenty of complex conversations lie ahead for a topic already rife with rococo detail.

The only simple advice right now comes from Dr. Deng. “This is not a no-brainer.” He pauses, then drives the point home even more directly, if slightly inelegantly: “This is a brainer.”

References:

  1. The CARGO researchers published a paper on the economic
    implications of using AlloMap (Evans RW, et al. Am J
    Transplant
    . 2005;5:1553–1558).
  2. 2R reflects the ISHLT’S recently revised guidelines.

Karen Titus is CAP TODAY contributing editor and co-managing editor.