Chromosome assembly of large and complex genomes using multiple references

Despite the rapid development of sequencing technologies, the assembly of mammalian-scale genomes into complete chromosomes remains one of the most challenging problems in bioinformatics. To help address this difficulty, we developed Ragout 2, a reference-assisted assembly tool that works for large and complex genomes. By taking one or more target assemblies (generated from an NGS assembler) and one or multiple related reference genomes, Ragout 2 infers the evolutionary relationships between the genomes and builds the final assemblies using a genome rearrangement approach. By using Ragout 2, we transformed NGS assemblies of 16 laboratory mouse strains into sets of complete chromosomes, leaving <5% of sequence unlocalized per set. Various benchmarks, including PCR testing and realigning of long Pacific Biosciences (PacBio) reads, suggest only a small number of structural errors in the final assemblies, comparable with direct assembly approaches. We applied Ragout 2 to the Mus caroli and Mus pahari genomes, which exhibit karyotype-scale variations compared with other genomes from the Muridae family. Chromosome painting maps confirmed most large-scale rearrangements that Ragout 2 detected. We applied Ragout 2 to improve draft sequences of three ape genomes that have recently been published. Ragout 2 transformed three sets of contigs (generated using PacBio reads only) into chromosome-scale assemblies with accuracy comparable to chromosome assemblies generated in the original study using BioNano maps, Hi-C, BAC clones, and FISH.

Data and Resources

Additional Info

Field Value
Author Kolmogorov, Mikhail
Last Updated November 20, 2019, 16:44 (UTC)
Created August 1, 2019, 10:28 (UTC)
Article Host Type repository
Article Is Open Access true
Article License Type
Article Version Type publishedVersion
Citation Report https://scite.ai/reports/10.1101/gr.236273.118
DOI 10.1101/gr.236273.118
Date Last Updated 2019-08-01T10:27:48.898799
Evidence oa repository (via pmcid lookup)
Funder code(s) Wellcome Trust (WT098051, WT202878/B/16/Z, WT108749/Z/15/Z); National Human Genome Research Institute (U41HG007234); European Molecular Biology Laboratory (); National Institutes of Health (1U01HL137183, 5U41HG007234, 3U54HG007990); W. M. Keck Foundation (DT06172015); European Community's Seventh Framework Programme (244356, FP7/2010-2014); European Union's Seventh Framework Programme (HEALTH-F4-2010-241504, FP7/2007–2013)
Journal Is Open Access false
Open Access Status green
PDF URL https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6211643/pdf/1720.pdf
Publisher URL https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6211643