On microchips, directing a
cast of thousands

April 2005
Feature Story

William Check, PhD

Analyzing data from the thousands of sites on a gene expression microarray is still an evolving science. "Bioinformaticians do not agree on the best way to do these analyses," said Lynne Abruzzo, MD, PhD, during her talk on the clinical potential of microarrays, or "gene chips," at the meeting of the Association for Molecular Pathology last November. To improve the ability to handle these large datasets, Duke University sponsors a competition each year called CAMDA—Critical Assessment of Microarray Data Analysis—in which bioinformaticians are challenged to come up with new ways to evaluate a publicly available dataset. To illustrate the pitfalls in handling data from microarrays, Dr. Abruzzo, associate professor of hematopathology at the University of Texas/MD Anderson Cancer Center, Houston, related an anecdote from the 2002 CAMDA competition.

In that year, the dataset contained expression profiles of genes in mouse liver, testis, and kidney. A group led by biostatistician Kevin Coombes, PhD, of MD Anderson, set out to determine which genes are specifically expressed in each of the organs. However, data from some of the samples did not cluster in an interpretable way. Dr. Coombes and his colleagues realized that one-third of the genes were labeled with the wrong gene name. When they figured out what the correct labels were and reanalyzed the data, they got a nice graph.

They reported this discrepancy to the scientists who had obtained and published the original dataset, and these scientists re-examined their data. They found that an extra line had been inserted into the Excel spreadsheet that matched the gene name with its location. All of the data entered after the insertion improperly matched the gene name with its location on the array—a registration error. "It’s very surprising that it was found," Dr. Abruzzo said.

Accurately handling massive amounts of data is only one of many challenges of using microarrays that Dr. Abruzzo detailed in her presentation at the AMP meeting. She concluded, "I don’t believe that microarrays are ever going to be directly useful for diagnostic testing." Many in the audience applauded her conclusion. But, Dr. Abruzzo told CAP TODAY, "What I said is kind of controversial. Sometimes when I give talks, people come up and insist, ’No, we are going to be using these in the clinical laboratory.’"

Dr. Abruzzo stands by her assessment, however. "As a research tool, gene expression microarrays are terrific," she says. "They allow you to look for expression of a few thousand to tens of thousands of genes at the same time. As a research tool, they have revolutionized how we look at gene expression." As a clinical tool, she does not think that they will be useful. What will eventually happen, Dr. Abruzzo believes, "is that expression analysis of a tumor, for instance, will identify a dozen or 50 genes that are important for diagnosis or prognosis. Instead of using a microarray with thousands of genes, we will do semiquantitative real-time PCR [QRT-PCR] for a few dozen genes, which is more reproducible and has a wider dynamic range."

Karen Kaul, MD, PhD, director of molecular diagnostics at Evanston (Ill.) Hospital and professor of pathology and urology at Northwestern University Feinberg School of Medicine, Chicago, predicts gene chips will be useful, with the qualification that utility will be confined to smaller chips. "With genetic diseases, I could envision doing cystic fibrosis or some other conditions by low- or moderate-density arrays," Dr. Kaul says. She did a trial of Clinical Microsensors’ 36-target array for cystic fibrosis and found that it worked well. She also sees microarrays being helpful in solid tumors and hematopathology, though again these will be smaller arrays—which she defines as a dozen to less than 100 sites, not thousands or even the upper hundreds. "And there may be a way to multiplex for different types of viruses or microbes that cause pneumonia or encephalitis, to put them on a small array," Dr. Kaul says. "Maybe later, when we better understand the markers, how to use them, and can better handle the data analysis, larger arrays will be more prevalent."

Matt van de Rijn, MD, PhD, associate professor of pathology at Stanford University School of Medicine, agrees that gene expression arrays are excellent discovery tools. "Many groups have used microarrays to help dissect the molecular phenotype of tumors," says Dr. van de Rijn, who has done some of this work himself. He says it is still too early to point to immediate results for patients. "In many cases I anticipate that is just around the corner," he says. "We have already developed markers and profiles that lead to better diagnosis and prognostication and identification of new therapeutic targets."

In his view, some version of microarrays will enter clinical service. "People complain they are so expensive," he says. "But compared to other clinical tests that doesn’t hold water. There will always be a difference between the discovery phase, where you want as many genes as possible, and clinical use, where you may want to contain those numbers. But financially a 70-gene array versus a 250-gene array or thousands of genes is not that different." He believes that at some point this will be a clinically relevant test, though he’s not willing to say that thousands of genes on microarrays will be the format.

Microarray devices are beginning to become available for laboratory use "slowly but surely," says Larry Kricka, DPhil, FRCPath, professor of pathology and laboratory medicine and director of the general chemistry laboratory at the Hospital of the University of Pennsylvania. "For the DNA type of microarray, probably one of the most important recent developments was FDA approval of the Roche/Affymetrix Amplichip," he says, which detects polymorphisms in cytochrome P450 enzymes that affect drug metabolism. "That was a real landmark."

Still, Dr. Kricka says, "I think most people would agree that very large-scale DNA or protein microarrays are still research tools." Dr. Kricka sees arrays containing small groups of markers being useful in diagnosing and managing disease. Of quadruple testing for disease in the unborn, for instance, he says, "I could imagine an array for that." Testing for the 25 mandated mutations for cystic fibrosis also falls in this area, as might allergen testing. "Beyond that virtually everything we do is discrete testing," Dr. Kricka notes. "Laboratories don’t do that many groups of tests bundled together."

While there are many types of microarrays, the term is most often used to describe a glass slide or silicon chip holding thousands of segments of nucleic acid at specific sites to which RNA or cDNA from cells or tissue is hybridized to create a gene expression profile. Another type of array is represented by the Roche/ Affymetrix Amplichip, which contains thousands of single-nucleotide polymorphisms, or SNPs. With an SNP chip, test nucleic acid is hybridized to see whether the subject’s genome contains any of these SNPs, which is a digital (yes or no) question, as opposed to the analogue (quantitative) question posed by gene expression profiling. SNP chips are well suited to screening, such as to identify individuals who have some specific characteristic, for instance, salt-sensitive hypertensives, or to classify patients according to their ability to respond to specific drugs. A separate type of array is a protein array, in which proteins are affixed to a support. Tissue arrays are an entirely different category (Related article: "What are tissue arrays?").

Two types of expression arrays are in common use. In the spotted array, pioneered by Pat Brown, PhD, and colleagues at Stanford University, solutions of nucleic acids containing fragments of genes are spotted onto a glass slide. In the printed array, pioneered by Stephen Fodor, PhD, and colleagues at Affymetrix, oligonucleotides (about 20 bases long) representing gene segments are built one nucleotide at a time on a silicon wafer by photolithography.

For clinical utility, both types of expression microarrays pose similar problems, which derive from the exact feature that is the array’s main strength—its ability to gather thousands of data points from a single run. One issue Dr. Abruzzo identifies is how you report a result if you don’t know its significance. "Do you report results for every gene or only the positive genes?" she asks. "How do you report 10,000 data points?" Moreover, she asks, "Why would you get thousands of measurements if it turns out you need only a dozen measurements to make a diagnosis? One of the things we do as pathologists is to narrow down the tests we want. We don’t order every stain at our disposal."

A second issue poses more of a statistical conundrum. "Many statisticians are saying that nobody really knows how to build models for analyzing these large datasets yet," Dr. Abruzzo says. To illustrate, she points to a paper by biostatisticians at a French research institute (Michiels S, et al. Lancet. 2005;365: 488-492). These investigators re-analyzed data from the seven largest published studies that attempted to predict prognosis of cancer patients on the basis of DNA microarray analysis. Their finding: "The list of genes identified as predictors of prognosis was highly unstable," with five studies publishing "overoptimistic results." As Dr. Abruzzo puts it, "They couldn’t cross-validate the results of those studies." She cites a statistical explanation: "When you do studies using at best only a couple of hundred samples but taking thousands of measurements, statisticians say you can find models that can classify tumors correctly based on noise. So I think microarrays are not ready for prime time in the clinical arena yet."

Dr. Kaul expresses a similar reservation. "One of the things that concerns me about translating large microarrays into the clinical laboratory is that I don’t believe we know what to do with that much information or have systems to handle it." Even with the Amplichip, she worries, "I’m not sure we will be able to deal with 1,000 or 2,000 data points."

Because SNP chips operate in a digital yes-or-no mode, software may be able to handle their data output more easily than expression microarrays which are quantitative, speculates Theodore Mifflin, PhD, research associate professor in the Department of Pathology and director of automation in the Medical Automation Research Center at the University of Virginia Health Sciences Center, Charlottesville. "We will see when people have experience with the Amplichip," he says.

Dr. Kricka says handling results from thousands of sites is a research issue now. "I cannot envision the clinical laboratory being involved with a test that will be based on the results of a thousand-location array. It is hard to see how we would do QC on that," he says.

Dr. Kricka sees quality control as a general issue with microarrays. "In the clinical laboratory we are used to doing tests one at a time in single-channel mode and controlling each individual assay separately," he says. "When you use a microarray you bundle all the tests together on one analytical device. Issues of QC suddenly become more complicated. If you have a 10-analyte device and you run a control to see if it works properly, what if one location doesn’t work? Do you ignore that and use data from the other nine locations? It’s not clear."

A British company, Randox, has a much-awaited analyzer called Evidence, which will use a biochip array with up to 20 tests per chip and perform multiple immunoassays on each patient sample—for example, thyroid array, tumor monitoring array. However, Dr. Kricka poses this question: Once you report the five or 10 analytes ordered on a given patient, what do you do with the other 10 or 15 results? One option is to leave the other results stored. Perhaps the clinician will look at the reported results and decide results from one of the other tests would be helpful. "Then you could report it immediately," Dr. Kricka says. However, a second school of thought says all results that were not ordered should be completely suppressed. Laboratories don’t face this question now. "We used to bundle tests together into profiles," Dr. Kricka says. "We moved away from that practice to discrete testing because of reimbursement rules. Microarray devices may send us back to that profiling approach," and indeed the Evidence has a retrospective reporting facility to enable retrieval of previously unreported results.

Dr. Mifflin is concerned about a number of technical issues. Regarding whether profiles derived from expression arrays are reliable, he says, "The literature contains a number of contradictory reports about whether these are useful at this time. Results may be conflicting or taken from a very narrow perspective." An array asks whether any genes are over- or underexpressed. But compared to what? And how large are the differences? "All those answers depend on what the investigators choose as reference material," he says. Even the definition of normal has come into question. "Some early studies used immortalized cell lines as reference material," Dr. Mifflin says. But these are basically malignant cells, or at least precursors to malignant cells, and may not be representative of normal healthy cells. "Most people are waiting to see what comes out of ongoing studies," he says.

Dr. Mifflin also cites several assumptions underlying gene expression studies: The process of extracting, amplifying, and hybridizing will accurately quantify all messages (mRNAs) with equal efficiency; message lifetimes won’t have an impact on outcome of the analysis; and messages actually translate into changes in phenotype. Regarding the second point—message lifetimes affecting outcomes, it is known that some messages are quite short-lived (seconds to minutes) and so are probably lost during sample processing. As to whether a change in expression translates into altered survival, Dr. Mifflin says, "What we see in the message analysis may be modulated or diluted by the biological systems we are trying to evaluate."

With microarrays the "information hogs" they are, says Dr. Mifflin, archival storage may also pose challenges. Microarray data are stored as TIFF files, which can take up to several megabytes just for one image. "If you scan a hundred slides, you are looking at gigabyte storage capability," he says. "That’s an information requirement not so common in clinical laboratories, not to mention the corresponding image and information-processing capabilities needed."

In Dr. Mifflin’s view, controls and surveys will be crucial for clinical laboratories that run microarrays such as the Amplichip. "People are starting to use it and we should be figuring out ways to validate the results," he says. "Right now basically there aren’t reference materials out there for expression or SNP microarrays. If you run controls it is because you made them yourself. CAP or a similar organization such as AACC could serve an excellent purpose here."

Dr. Abruzzo agrees that QC is difficult with microarrays, and it’s "because they have many moving parts," she says. Variability can arise in at least five places: array fabrication, target preparation, target hybridization, imaging, and data analysis. For each source of variation, Dr. Abruzzo has collected two or three real-life examples, which constitute her "Microarray Hall of Shame." For instance, when the MD Anderson microarray laboratory was being set up a few years ago, 2,300 clones were purchased from a reputable supplier. When the clones were grown and sequenced, only 79 percent could be verified. Another problem in array fabrication occurred in 2000 when a commercial vendor printed arrays in which about one-third of the mouse sequences were wrong because they were copied from the wrong strand. Target preparation is also crucial. "Garbage amplification is exponential," Dr. Abruzzo says. "You start with a couple of crappy RNA samples and you end up with reams of crappy data."

Dr. Abruzzo found what she calls a "somewhat scary" result demonstrating variability in data analysis when she did experiments in her primary area of research—gene expression profiles in chronic lymphocytic leukemia, or CLL. Cases of CLL with somatic hypermutation have a much better prognosis—a median survival of 25 years compared to eight years for cases lacking hypermutation. "I was trying to find genes differentially expressed between those two groups," she says. When results were analyzed using a standard statistical method, called dChip, the results were "not particularly biologically interesting," Dr. Abruzzo says. She asked the statistician if there were other statistical tests that could be used to analyze the data. He then used two other common methods, the two-sided T test and Wilcoxon analysis. To their surprise, they ended up with three different lists of genes with little overlap.

"I thought that was kind of shocking considering that anyone doing microarray work could be using any of these tests," Dr. Abruzzo says. To find out which statistical test gave the "true" result, she took about one dozen genes identified by each test but not the other two and subjected them to QRT-PCR validation. "I was expecting to find that one test was giving the true result and the others were picking up noise," she says. Instead, she found it didn’t matter which statistical test she used: QRT-PCR confirmed differential expression for about 85 percent of the gene candidates irrespective of which statistical method identified them. "That means that even though all three methods gave different profiles, they were all equally good at finding differentially expressed genes," Dr. Abruzzo concludes. "So if your goal is to get a complete list of differentially expressed genes, you probably have to use more than one statistical test on your expression array data."

Given the drawbacks of expression microarrays, it is not surprising that many experts predict microarrays will be used in research mode to identify diagnostic or prognostic genes, which will then be assayed clinically by a simpler method. One example of this is Genomic Health’s Oncotype Dx assay. From a set of 250 candidate genes, many identified on microarrays, 21 genes were selected that predicted response of breast cancer to tamoxifen (Paik S, et al. New Engl J Med. 2004;351:2817-2826). These genes are assayed by multiplex RT-PCR.

While it is a good approach, multiplex RT-PCR has limits, too, according to Dr. Kaul, particularly on the detection end. "In our lab, we are big fans of distinguishing sequences based on melt curve differences," she says. "But there is a limit to what you can do with that." She has been looking at how to identify and differentiate atypical mycobacteria. "We have some PCR assays to do that," she says. "But it is very difficult with melt curve analysis. As you put many targets in there it can become difficult to discern them all."

A preferable approach might be to do multiplex PCR with a low-density array as a detection system. "Say you have a tube of amplicons," Dr. Kaul explains. "How do you tell what is in there rapidly? You could do gels. You could set up fluorescent probes—there are systems that allow you to identify and quantitate a couple dozen markers. But the least work-intense and most attractive option might well be a low-density array."

Several companies are working on low-density arrays for clinical use. For instance, Nanogen has several applications slated for release this year as ASRs, including factor V/II, a 37-mutation cystic fibrosis chip, a chip for identifying seven viral respiratory pathogens, one for CYP450, one with three hyper homocysteinemia markers, and an Ashkenazi Jewish panel, according to a spokesperson.

Work by Dr. Brown and his coworkers at Stanford is an interesting example of the reductionist approach—identifying a significant expression profile, then finding a relatively small subset that is clinically useful. A group of Dutch investigators reported in 2002 that a subset of 70 genes identified from expression microarray analysis predicted survival in primary breast carcinoma (van de Vijver MJ, et al. New Engl J Med. 2002;347:1999-2009). Later, a postdoctoral fellow in Dr. Brown’s laboratory, Howard Chang, MD, PhD, using a completely different approach, identified a different profile or "signature" on microarray analysis that was prognostically significant for several types of epithelial tumors, including breast carcinoma (Chang HY, et al. PLoS Biol. 2004;2:E7). (Dr. Chang proceeded from the premise that metastasis was biologically akin to wound healing; he set out to find a gene expression profile in metastatic tumors that was similar to the profile in fibroblasts exposed to serum. Thus, he calls his profile a "wound-response gene expression signature.")

Using the 295 primary breast carcinomas that the Dutch group tested initially, Dr. Chang found that the wound-response signature improves risk stratification "independently of known clinico-pathologic risk factors" and independently of the Dutch 70-gene set (Chang HY, et al. Proc Natl Acad Sci USA. 2005; E-pub ahead of print). Dr. van de Rijn notes that the Dutch gene set is being offered in the Netherlands prospectively for breast cancer patients. "Howard’s wound-healing gene set will probably be validated in a number of other ways," he says, "and we hope it will not only predict outcome but also perhaps identify tumors that respond to certain therapies."

Research being done by Lawrence True, MD, professor of pathology at the University of Washington Medical Center, Seattle, on prostate cancer exemplifies several of the principles expressed in this discussion. Dr. True reported at the USCAP meeting in March that he and his colleagues had identified on expression microarrays several genes that appear to be specific for each Gleason grade. He used laser capture microdissection, or LCM, on prostate biopsies to isolate tumor cells (2,000-5,000 cells per array) that he identified microscopically. LCM took about one hour per specimen, which included specimen block selection, confirmation of cell composition, and cutting the frozen sections. "With this step we greatly increased our confidence that the genes expressed were made by cancer cells and not by adjacent stromal cells," Dr. True says. Samples were hybridized to two arrays, a generic array containing about 20,000 genes and a custom 18,000-gene array generated by Peter Nelson, MD, a project and core leader of the research group.

No single gene was specific for any Gleason grade, but an RNA expression profile was able to differentiate between grades. "In our initial test, the overall profile distinguished all low-grade from high-grade cancers," Dr. True says. (Samples were dichotomized to have a Gleason score of =6 or >6.) In a validation experiment, a subset of 54 genes was tested against an independent set of cancers. It was 85 percent accurate. "So we can say that that set of 54 genes at the RNA level distinguishes the large majority of high-grade from low-grade cancers," Dr. True concludes. He then immunostained sections for proteins made by four of the genes. (Initially he tried 10 proteins, but six antibodies didn’t work reliably on fixed tissues. Commercial antibodies are not available for most of the proteins.) "Those four antibodies confirmed the difference in expression," Dr. True says. "So the four gene products can potentially be used as markers for different grades."

Previous studies came up with different sets of prognostic genes, Dr. True says, but they did not use laser microdissected preparations. "A first explanation for us is that some genes could be differentially expressed by stromal cells," he says. Dr. True and his colleagues have in fact found differences in expression of stromal genes associated with cancerous versus noncancerous prostate tissue.

Dr. True sees this work moving in two clinical directions. First, gene products that characterize each Gleason grade can be used as more specific diagnostic tools and assayed with immunohistochemistry. Second, in a process similar to what is being done with breast cancer, a subset of genes will be identified that more specifically characterizes each Gleason grade of prostate cancer, and single genes from that set can be measured. "Initially IHC will supplement standard methods at the biopsy stage," Dr. True says. After assigning risk, the pathologist will decide whether RNA measurement would be helpful. If so, a second, frozen sample would be taken and assayed. For higher-risk patients, a radical prostatectomy sample would be used. An enrichment step will probably be needed, Dr. True suspects, though LCM, which is impractical for routine use, may not be necessary.

Summarizing her experience doing research on microarrays, Dr. Abruzzo says, "I really thought a few years ago when I started that we would put RNA on a chip and get answers. But now my viewpoint has changed. I don’t think this is the way we are going to go diagnostically.

"And," she adds, "I’m not hanging up my microscope yet. When pathologists look down a microscope, we are actually doing gene expression profiling. We are looking in a morphological way at the end result of the gene expression program."

William Check is a medical writer in Wilmette, Ill. The Roche/Affymetrix Amplichip will be covered more fully in the June issue in an article on clinical pharmacogenetics and laboratory medicine practice guidelines.