Chromosome assembly of large and complex genomes using multiple references

Despite the rapid development of sequencing technologies, the assembly of mammalian-scale genomes into complete chromosomes remains one of the most challenging problems in bioinformatics. To help address this difficulty, we developed Ragout 2, a reference-assisted assembly tool that works for large and complex genomes. By taking one or more target assemblies (generated from an NGS assembler) and one or multiple related reference genomes, Ragout 2 infers the evolutionary relationships between the genomes and builds the final assemblies using a genome rearrangement approach. By using Ragout 2, we transformed NGS assemblies of 16 laboratory mouse strains into sets of complete chromosomes, leaving <5% of sequence unlocalized per set. Various benchmarks, including PCR testing and realigning of long Pacific Biosciences (PacBio) reads, suggest only a small number of structural errors in the final assemblies, comparable with direct assembly approaches. We applied Ragout 2 to the Mus caroli and Mus pahari genomes, which exhibit karyotype-scale variations compared with other genomes from the Muridae family. Chromosome painting maps confirmed most large-scale rearrangements that Ragout 2 detected. We applied Ragout 2 to improve draft sequences of three ape genomes that have recently been published. Ragout 2 transformed three sets of contigs (generated using PacBio reads only) into chromosome-scale assemblies with accuracy comparable to chromosome assemblies generated in the original study using BioNano maps, Hi-C, BAC clones, and FISH.

Data and Resources

Additional Info

Field Value
Source
Version
Authors
Maintainer
Maintainer Email
Article Host Type repository
Article Is Open Access true
Article License Type
Article Version Type publishedVersion
Citation Report https://scite.ai/reports/10.1101/gr.236273.118
DOI 10.1101/gr.236273.118
Date Last Updated 2019-08-01T10:27:48.898799
Evidence oa repository (via pmcid lookup)
Funder code(s) Wellcome Trust (WT098051, WT202878/B/16/Z, WT108749/Z/15/Z); National Human Genome Research Institute (U41HG007234); European Molecular Biology Laboratory (); National Institutes of Health (1U01HL137183, 5U41HG007234, 3U54HG007990); W. M. Keck Foundation (DT06172015); European Community's Seventh Framework Programme (244356, FP7/2010-2014); European Union's Seventh Framework Programme (HEALTH-F4-2010-241504, FP7/2007–2013)
Journal Is Open Access false
Open Access Status green
PDF URL https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6211643/pdf/1720.pdf
Publisher URL https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6211643