关键词:基因组测序;信息理论;DNA序列拼接;统计方法
摘 要:In this thesis we study the algorithmic problem of \emph{de novo} DNA sequence assembly, focusing on the challenge of dealing with genomic repeats. We develop two new assembly tools, as well as initiate the study of information-theoretic limits of shotgun sequencing for realistic genomes.Our first novel algorithm for DNA assembly, Telescoper, is designed for assembly of telomeres.Our second novel algorithm for DNA assembly, Piper, takes a statistical approach to resolving ambiguity caused by repeats.In the final portion of the thesis, we investigate the information-theoretic limits of DNA sequencing, focusing on the effect of repeats.