For viral diagnosis, metagenomic NGS

William Check, PhD

June 2015—A 20-year-old woman who had returned to the U.S. after two months of hiking in Western Australia presented with three days of acute febrile illness—fever, rash, headache, nausea, and muscle and joint pain. Testing for common infectious causes of acute febrile illness, including Epstein-Barr virus, cytomegalovirus, and human immunodeficiency virus, all turned up negative. While the woman was in Australia, she had been warned about an ongoing outbreak of an exotic alphavirus, Ross River virus, in the region where she was hiking.

Dr. Chiu: SURPI provides a potential solution to the computational bottleneck.

Dr. Chiu: SURPI provides a potential solution to the computational bottleneck.

A blood sample came to the clinical microbiology laboratory at the University of California, San Francisco, Medical Center, where Charles Chiu, MD, PhD, is associate director. “Unfortunately, both antibody and PCR testing for Ross River virus are not readily available,” Dr. Chiu told attendees at this year’s annual meeting in April of the Clinical Virology Symposium, where he related this case as part of his talk on “Next-Generation Sequencing for Viral Diagnosis.” A call to the Centers for Disease Control and Prevention found that the agency did not have a validated test for this virus.

Dr. Chiu turned to a new molecular method that his laboratory had been developing—metagenomic next-generation sequencing—to look for an infectious cause for the woman’s illness. “All we saw in the blood—and it was a very clean metagenomic signature—were sequences of human herpesvirus 7 [HHV-7],” Dr. Chiu reported. Of 3 million reads, 16 sequences with 100 percent homology to HHV-7 were identified. Moreover, the result came back in 48 hours.

The patient recovered in two weeks. Finding HHV-7 as a likely cause of the woman’s illness was unexpected, since primary infection with HHV-7 in adults is “rare,” Dr. Chiu said. Most people are exposed to this virus in childhood, when it causes roseola.

This case illustrates two critical features of the metagenomic NGS technique described by Dr. Chiu, who is associate professor in the Departments of Laboratory Medicine and Medicine/Infectious Diseases at UCSF and director of the UCSF/Abbott Viral Diagnostics and Discovery Center.

First, metagenomics is random, or unbiased. Traditional microbiological methods, including culture, antigen and antibody testing, and PCR, follow what Dr. Chiu called the “one bug, one test” paradigm. With PCR, for example, you start with an a priori suspicion of one or a few pathogenic organisms, then select primers and probes that can detect that agent or agents. Metagenomic NGS, on the other hand, is a shotgun approach, in which you amplify and sequence the entire DNA content of a sample without using any primers or probes. It “casts a wide net,” Dr. Chiu said, and in principle encompasses the entire spectrum of organisms that can cause disease. Dr. Chiu termed metagenomic NGS an “agnostic” approach, since it assumes no knowledge of what the causative agent may be. Because NGS is based on DNA sequences, there is no reason why it can’t capture all organisms in one assay (with the exception of prions).

In this case, although the clinical suspicion for the causative agent was Ross River virus, that was irrelevant—the assay produced a result independent of the clinical suspicion.

Dr. Chiu showed data demonstrating that the metagenomic NGS procedure that his laboratory has developed can detect a broad variety of pathogens—metapneumovirus, rhinovirus, adenovirus 7, Haemophilus influenzae, Salmonella typhi (typhoid fever), and Plasmodium falciparum (malaria)—using the same techniques, reagents, and analytic software.

An additional advantage of an unbiased assay is that it makes validation easier. If you can validate a single protocol, you don’t need to modify that protocol by adding primers and probes when you add a new agent to your panel, Dr. Chiu pointed out. Once you validate a protocol, it doesn’t change.

(Selective next-generation sequencing is in widespread use. In this method, samples are enriched for desired sequences, as in PCR, and these selected DNA segments are amplified.)

Like all techniques, metagenomic NGS has disadvantages. Most of what you analyze will be what Dr. Chiu called “junk”—sequences other than what you are looking for. Human host sequences make up a large part of this background. As a result, the technique needs to have very high sensitivity to be able to detect the very small fraction of reads that correspond to the virus or other pathogen you want to detect. Over the last several years it has become possible to achieve such high levels of sensitivity with massively parallel high-throughput sequencing techniques that fall under the general rubric of NGS. As an example, Dr. Chiu cited the Illumina MiSeq machines that his laboratory uses, which perform 30 to 40 million reads per run. “So we can throw away most of the data,” he said, and still have an adequate number of reads to detect sequences from a viral pathogen that is present at very low concentration.

In addition, manufacturers have developed methods to barcode sequences, so that you can multiplex many clinical samples at one time to reduce costs. On the flip side, the cost of doing this is that you have fewer sequence reads per sample.

A major challenge with metagenomic NGS is to process the millions of reads in such a way that the signal stands out from the noise. To do this, the bioinformatics pipeline must identify and subtract human DNA, then align the remaining reads to pathogen-specific databases (bacterial, viral) or to databases containing all pathogens (National Institutes of Health GenBank NT, for example). Dr. Chiu called this process “a huge computational bottleneck.” Identifying human DNA and subtracting it from total reads and then identifying pathogen-specific reads in the remaining database can take days to weeks with standard methods. As a result, this analysis has been “a huge stumbling block,” he said.

Here is where the second crucial feature in the metagenomics NGS technique used by Dr. Chiu’s laboratory comes into play. They have developed a sophisticated bioinformatics pipeline called SURPI (Sequence-based Ultra-Rapid Pathogen Identification) that rapidly analyzes metagenomic data to identify pathogens. SURPI provides a potential solution to the computational bottleneck, Dr. Chiu said, yielding results from analysis of millions of reads in minutes to hours rather than days. Two modes are possible with SURPI. “In fast mode, SURPI detects viruses and bacteria by scanning data sets of 7–500 million reads in 11 min to 5 h,” Dr. Chiu and his colleagues wrote, “while in comprehensive mode, all known microorganisms are identified, followed by de novo assembly and protein homology searches for divergent viruses in 50 min to 16 h” (Naccache SN, et al. Genome Res. 2014;24:1180–1192).

SURPI was made possible by leveraging an algorithm called SNAP that was developed by a former post-doctoral fellow in Dr. Chiu’s laboratory in collaboration with the University of California, Berkeley, and Microsoft. Dr. Chiu described SURPI as “a way to rapidly align NGS reads to reference databases.”

Since the initial development of SURPI, Dr. Chiu’s laboratory has added enhancements to the pipeline to facilitate clinical interpretation and results reporting, which makes it more amenable to clinical laboratories. For example, the program now provides rapid filtering for sequences that are mis-annotated as viral in databases. Dr. Chiu noted that GenBank is not well curated and is “rife with mis-annotations.” Known mis-annotations are now identified as such in the algorithm.

Also, SURPI now provides accurate taxonomic classification against all sequences in NIH GenBank NT.

In addition, SURPI now has front-end visualization tools and a Web-based interface, so laboratories don’t need extensive bioinformatics expertise. A clinical laboratory director or clinical laboratory scientist can run the data. Dr. Chiu called the website SURPIviz “the visual home of SURPI.” It is a front end for data analysis, a Web-based visualization interface into which a user can load data and look at it in various formats. As an example, Dr. Chiu showed a heat map from SURPIviz identifying Ebola virus Zaire in a patient from the Democratic Republic of Congo with hemorrhagic fever.

Several other enhancements have also been integrated into SURPI. All will be described in an upcoming publication.

Beverly B. Rogers, MD, chief of pathology at Scottish Rite and Egleston hospitals in Atlanta, attended Dr. Chiu’s talk at the symposium. “I think that Dr. Chiu’s utilization of NGS, and especially bioinformatics, is potentially game-changing,” she told CAP TODAY. Dr. Rogers, who is adjunct professor of pathology and pediatrics at Emory University, added, “He is a leader in this field, and I don’t know of anyone else who is doing what he is doing.”

Taken together, the features of this metagenomic NGS method provide a solution to several problems in current diagnostic microbiology that Dr. Chiu highlighted. For one, he said, “We are still unable to diagnose a large fraction of acute infectious diseases.” In critically ill patients in the ICU, for example, up to 25 percent of the time a diagnosis of clinical pneumonia is never made because it is not possible to identify a pathogen. The situation is even worse for diarrheal diseases: Up to 50 percent of the time no cause is found. For meningitis and encephalitis the figure is even higher, up to 75 percent. In principle, given adequate reference databases, metagenomic NGS should be able to diagnose virtually every infectious illness.

Another challenge, Dr. Chiu said, is that “Many pathogens, especially emerging viruses, have highly divergent genome sequences.” Tests that rely on fidelity or conservation of genomic sequences, such as the primers and probes of PCR, are often not effective for detecting variant strains. Metagenomic NGS, on the other hand, doesn’t depend on primers and probes.

Further, many infectious disease syndromes, such as pneumonia and encephalitis, can be caused by a variety of pathogens. Adequate multiplex tests are lacking for these syndromes. Metagenomic NGS, on the other hand, is the ultimate multiplexed assay.

In the context of an accredited and CLIA-certified laboratory, implementing metagenomic NGS coupled to an advanced bioinformatics pipeline can produce rapid and broad agent identification and provide cost-effective and actionable information for early treatment of patients, Dr. Chiu said. Such accelerated diagnosis can have an impact on clinical decision-making.

Dr. Chiu showed how a CLIA-certified clinical laboratory can couple these analytical tools to workflow. Extraction, random hexamer cDNA synthesis, and library preparation take about six hours; a MiSeq run takes six to 40 hours; the analysis pipeline and report generation take 10 minutes to six hours. Overall turnaround time is 12 hours to two days. Dr. Chiu’s group is working to get this time down to eight hours so that a complete sample-to-result analysis can be performed in one shift.

He presented two other clinical cases to illustrate the utility of metagenomic NGS. A 70-year-old man with pancytopenia reported two months of fever and chills, diarrhea, and fatigue. Ten months before he became ill he had gone to Spain for a six-week, 500-mile hiking excursion. During this hike, he said, he had been bitten everywhere by ticks and mosquitoes. On MRI he had liver cysts along with an enlarged spleen and liver. A workup for infectious agents at a community hospital was negative. Because of a clinical concern for hemophagocytic lymphohistiocytosis, the man was transferred to a tertiary care hospital.

Dr. Chiu’s laboratory sequenced DNA taken from FFPE tissue from the man’s bone marrow biopsy. Metagenomic NGS diagnosed visceral leishmaniasis. The diagnosis was verified by PCR on the bone marrow biopsy. After two weeks of IV amphotericin (along with blood transfusions and G-CSF), the man was clinically improved and is now doing well.

In the third case, a 55-year-old man exhibited rapidly progressive hearing loss five months following immunosuppressive therapy and a bone marrow transplant for CLL. PCR for herpes simplex virus and enterovirus were negative. He was treated with high-dose valacyclovir, steroids, and IVIg to no effect. Over the next few weeks, he developed nausea, ataxia, and fatigue, then depressed and irritable mood. An MRI showed abnormal signal in the thalamus and midbrain; frontal lobe biopsy revealed diffuse inflammation. Conventional testing of the biopsy material was initially negative for a variety of pathogens.

Sequencing of brain biopsy material provided a diagnosis of astrovirus encephalitis. Again the signal was clear, with 295 astrovirus reads (0.000077 percent of the total sequence reads), which allowed assembly of the entire astrovirus genome (Naccache SN, et al. Clin Infect Dis. 2015;60:919–923). In situ hybridization of neurons postmortem was positive for the astrovirus strain, verifying the NGS diagnosis. At that point, Dr. Chiu said, astrovirus encephalitis had been described only once before, in a 15-year-old boy with X-linked agammaglobulinemia, by the laboratory of W. Ian Lipkin, MD, at Columbia University, using unbiased sequencing (Quan PL, et at. Emerg Infect Dis. 2010;16:918–925). Dr. Lipkin and his colleagues wrote that their findings (like those of Dr. Chiu’s laboratory) “highlight unbiased molecular technology as a valuable tool for differential diagnosis of unexplained disease.” Unfortunately, there is no therapy for astrovirus encephalitis and the patient died four months later despite attempted treatment with ribavirin and IVIg.

Dr. Chiu showed that the strain of astrovirus causing encephalitis in this patient was part of a phylogenetic clade that includes astrovirus strains responsible for outbreaks of encephalitis in cows and minks. Essentially the same virus found in Dr. Chiu’s patient has also been found to cause encephalitis in several children with severe combined immunodeficiency in the U.K. “So we now think there is potentially an encephalitic astrovirus clade that is spreading through the U.K.,” Dr. Chiu said.

Dr. Rogers said she considered the clinical diagnoses in these cases convincing. She believes this methodology could be “very useful” to clinical viral diagnostics.

In addition to using metagenomic NGS to diagnose unexplained human illness, Dr. Chiu’s group is applying this technique for acute hemorrhagic fever (AHF) surveillance in the Democratic Republic of Congo, especially in those with Ebola (though not the same Ebola virus strain circulating in West Africa) and Marburg viruses. Using whole blood samples from patients presenting with AHF in a small outbreak in 2014, metagenomic NGS was able to identify all Ebola samples that were positive by conventional PCR or real-time PCR.

Metagenomic NGS allows not just identification but also viral genome mapping. SURPI identified the closest viral match in the database as the Ebola Mayinga clone from an outbreak in 1976. (That was the closest match at the time, which was before data were published on the strain causing outbreaks in 2014 in West Africa.) Dr. Chiu emphasized that this genome assembly was the “raw output” coming out of the enhanced pipeline. “It was not annotated by me,” he said, but was “automatically generated,” underscoring the accessibility of the enhanced version of SURPI to clinical laboratories.

In a very different application, Dr. Chiu’s laboratory is using metagenomic NGS to generate a transcriptome profile of the host response in patients with Lyme disease. They are doing this in collaboration with a group at Johns Hopkins that is conducting a study called Study of Lyme Immunology and Clinical Events, or SLICE. Twenty-nine patients with a characteristic bull’s-eye rash and confirmation of Borrelia burgdorferi infection by serology and/or PCR, along with 13 matched controls, had samples taken over six months. Using metagenomic NGS, Dr. Chiu’s group was able to narrow the gene expression profile to a panel of 59 genes that appeared to discriminate acute Lyme disease from healthy controls with high accuracy—96 percent sensitivity and 100 percent specificity. The panel was specific compared to other causes of influenza or other acute bacterial infections.

Dr. Chiu said this approach would be good for diseases for which direct detection is not sufficient. In Lyme disease, for example, PCR has only about 20 percent sensitivity. They are now trying to use this type of profiling to develop a diagnostic test for early symptomatic diagnosis of Ebola hemorrhagic fever. In work so far, a 25-gene biomarker panel constitutes a clear profile that is distinct from the host response in infection with Lassa, malaria, and other non-Ebola febrile illness. A test such as this could affect early management of Ebola, such as signaling an early need for isolation.

Dr. Chiu finished his talk by discussing what he called “a very exciting technology with a lot of potential”: real-time metagenomic virus detection with nanopore sequencing. A nanopore sequencing instrument is a device slightly larger than a USB stick that can be hooked up to a laptop to potentially enable point-of-care diagnosis. A nanopore sequencer has two disadvantages, Dr. Chiu said. First, it does not currently provide much throughput, only about 100,000 reads per run. Second, these reads have high error rates—10 percent to 30 percent are typical error figures—which provide a great challenge for metagenomic sequencing. A key advantage is that it is rapid. Dr. Chiu said the goal with nanopore sequencing is to reduce the current sample-to-answer TAT from between 12 and 24 hours to six hours.

What makes nanopore sequencing attractive, Dr. Chiu added, is that it allows real-time sequencing. You can do the analysis “on the fly” while generating sequencing reads. Dr. Chiu showed videos of two nanopore runs in which the Oxford Nanopore instrument detected chikungunya virus in five minutes and Ebola in seven minutes. All earlier sequence reads corresponded to human or other eukaryotes. Many human sequences were misidentified as cow, rat, or insect because of the technique’s high error rate.

Dr. Chiu estimated that a nanopore sequencer currently has a sensitivity of 10,000 to 100,000 particles/mL. The latest version does not yet achieve PCR levels of sensitivity, he said. He noted that the manufacturer is making improvements that could increase its sensitivity.

In conclusion, Dr. Chiu said that as metagenomic NGS moves toward real time, the clinical utility of the technology for making actionable diagnoses will become more and more clear.

In the discussion period, several pertinent issues were raised. “With regard to FDA regulatory approval, this is an area that is rapidly evolving,” Dr. Chiu said in response to one question. He noted that regulatory challenges for sequencing are not restricted to NGS or to infectious diseases. “Many challenges with regard to NGS protocols, database management, and analysis are being dealt with in oncology and genetics,” he said. He noted that “FDA recognizes that there is a lot of potential to this [technology] and probably will treat next-gen sequencing a little differently. Fundamentally, I think it is a good idea for FDA to treat this differently from single-targeted molecular assays.” In any event, he said, “This won’t be settled quickly.”

Several questions were related to interpreting NGS results to pinpoint the causative virus.

One person asked how it is possible to be sure that a virus detected by metagenomic NGS is causative and not just noise. “The answer is that you can’t,” Dr. Chiu replied, noting that any molecular technique faces this problem. For this reason, he does not believe the technology should completely replace traditional testing. “I think antibody testing will always be useful as a way to assay the host response,” he said. He sees NGS as complementary to conventional methods. “It is not meant to replace other diagnostic testing that might give you that answer,” he said.

The question arose of interpreting the meaning of a negative test, as in the case of neurologic illness possibly caused by enterovirus D68, in which no evidence for that viral strain was found by metagenomic NGS in cerebrospinal fluid of affected persons. “Just like any other molecular test, NGS needs to be validated in the clinical setting,” Dr. Chiu said. In that context, you can make an interpretation based on the known analytical sensitivity and specificity of the test. He suggested that metagenomic NGS could be useful not only as a rule-in assay but perhaps also as a rule-out assay.

“I can envision that clinically this would be very useful in cases where if you can say with some degree of confidence that this is unlikely to be caused by infectious disease or an infectious pathogen, this is something that would be very useful and could be used as information,” he said. “Certainly once you validate the test you can better interpret the meaning of a negative result, not finding anything.”

One person asked, in the clinical cases that Dr. Chiu presented, whether other viruses were seen that could have caused the symptoms. “Yes, we do see other viruses,” he answered. Commonly they see anelloviruses or bacterial colonizers such as Staphylococcus epidermidis. “This is currently not an assay that can be simply a readout,” Dr. Chiu emphasized. “You need a certain level of interpretation. At least initially it will be very similar to how we handle slides.” In the microbiology laboratory they have essentially a pathologist interpret the readout, which he compared to a radiologist interpreting an X-ray.

“As we get a better idea of what the data look like, eventually we can develop automated ways to analyze and interpret,” Dr. Chiu continued. “But currently it requires a pathologist or genomicist to look at the data.”
[hr]

William Check is a writer in Ft. Lauderdale, Fla.