Abstract
We describe a set of computational problems motivated by certain analysis tasks in genome resequencing. These are assembly problems for which multiple distinct sequences must be assembled, but where the relative positions of reads to be assembled are already known. This information is obtained from a common reference genome and is characteristic of resequencing experiments. The simplest variant of the problem aims at determining a minimum set of superstrings such that each sequenced read matches at least one superstring. We give an algorithm with time complexity O(N), where N is the sum of the lengths of reads, substantially improving on previous algorithms for solving the same problem. We also examine the problem of finding the smallest number of reads to remove such that the remaining reads are consistent with k superstrings. By exploiting a surprising relationship with the minimum cost flow problem, we show that this problem can be solved in polynomial time when nested reads are excluded. If nested reads are permitted, this problem of removing the minimum number of reads becomes NP-hard. We show that permitting mismatches between reads and their nearest superstrings generally renders these problems NP-hard.
Index Terms
- Multiple Sequence Assembly from Reads Alignable to a Common Reference Genome
Recommendations
Genome Sequence Assembly: Algorithms and Issues
Ultimately, genome sequencing seeks to provide an organism's complete DNA sequence. Automation of DNA sequencing allowed scientists to decode entire genomes and gave birth to genomics, the analytic and comparative study of genomes. Although genomes can ...
Aggressive assembly of pyrosequencing reads with mates
Motivation: DNA sequence reads from Sanger and pyrosequencing platforms differ in cost, accuracy, typical coverage, average read length and the variety of available paired-end protocols. Both read types can complement one another in a ‘hybrid’ ...
De Novo Assembly of Lucina pectinata Genome using Ion Torrent Reads
PEARC '17: Proceedings of the Practice and Experience in Advanced Research Computing 2017 on Sustainability, Success and ImpactLucina pectinata is a bivalve that lives in sulfide-rich environments and houses intracellular sulfide oxidizing endosymbiont. This organism is an ideal model to understand adaptive mechanisms and chemoautotrophic endosymbiosis in organisms living in ...
Comments