关键词:基因组;统计方法;测序技术;信息理论
摘 要:In the last decade, sequencing technology has progressed rapidly, leading to much faster and cheaper production of short-read data. The challenge of assembling the reads into an accurate reconstruction of the sequenced genome, however, has increased. This is because the assembly problem is made more dicult when the reads are shorter, especially for genomesof most higher organisms, which contain complicated repeat structures. In this thesis we study the algorithmic problem of de novo DNA sequence assembly, focusing on the challenge of dealing with genomic repeats. We develop two new assembly tools, as well as initiate the study of information-theoretic limits of shotgun sequencing for realistic genomes.