Home >> ALL ISSUES >> 2018 Issues >> Molecular pathology selected abstracts

Molecular pathology selected abstracts

image_pdfCreate PDF

Open-source algorithms for identifying structural variants in single-molecule sequencing

There is increasing evidence that structural variations, including insertions, deletions, duplications, inversions, and translocations, play a vital role in genetic diversity and disease. Large structural variations, such as aneuploidies and large translocations, are traditionally detected with optical microscopes. Submicroscopic structural variants are much more difficult to detect, even with modern techniques. The widely used short-read next-generation sequencing techniques lack sensitivity and are susceptible to high false-positive rates. They are also likely to misinterpret complex or nested structural variations. Long-read single-molecule sequencing, with average read lengths of 10 kbp or higher, can be more accurately aligned to reference sequences and are more likely to span the breakpoints in the DNA that cause structural variants. However, the drawback of long-read methodologies is that the sequencing error rate is high with some platforms, showing up to a 20 percent error rate. The authors conducted a study in which they introduced two open-source algorithms: the NGMLR algorithm for long-read alignment, which can align reads even in the presence of small indels that occur because of sequencing errors, and the Sniffles algorithm, which can call true structural variants in the noisy background. The authors tested these algorithms using parent progeny trios, for which Mendelian discordance rates ranged from 3.4 to 5.6 percent, as compared with a 21.1 percent discordance rate using short-read analysis. The long-read algorithms were then tested in healthy and breast cancer genomes, for which Sniffles was able to detect far more structural variants overall but called far fewer translocations than the short-read methods. However, the authors demonstrated that more than 80 percent of the translocations that were called by the short-read methods were false, caused by a deletion or insertion that resulted in the short reads being mismapped to different areas of the genome. The long-read algorithms, on the other hand, accurately called these structural variants. Finally, the authors demonstrated that Sniffles was able to accurately call complex, nested structural variants, such as inverted duplications and inverted deletions. They concluded that identifying structural variants is challenging, especially with the short-read technology being used clinically. Long-read sequencing using novel alignment and variant-calling algorithms, such as NGMLR and Sniffles, could be used to accurately identify structural variants. Widespread use of long-read sequencing technologies, for clinical and research purposes, may shed light on the many structural variants that cause genetic diversity and that play a role in various diseases.

Sedlazeck FJ, Rescheneder P, Smolka M, et al. Accurate detection of complex structural variations using single molecule sequencing. Nat Methods. 2018. doi.org/10.1038/s41592-​018-0001-7.

Correspondence: Dr. Fritz J. Sedlazeck at fritz.sedlazeck@bcm.edu or Dr. Michael C. Schatz at mschatz​@​cs.jhu.edun

CAP TODAY
X