-
1771.基因组测序的统计方法
[信息传输、软件和信息技术服务业] [2015-04-09]
In this thesis we study the algorithmic problem of \emph{de novo} DNA sequence assembly, focusing on the challenge of dealing with genomic repeats. We develop two new assembly tools, as well as initiate the study of information-theoretic limits of shotgun sequencing for realistic genomes.Our first novel algorithm for DNA assembly, Telescoper, is designed for assembly of telomeres.Our second novel algorithm for DNA assembly, Piper, takes a statistical approach to resolving ambiguity caused by repeats.In the final portion of the thesis, we investigate the information-theoretic limits of DNA sequencing, focusing on the effect of repeats.
关键词:基因组测序;信息理论;DNA序列拼接;统计方法
-
1772.贝叶斯知识跟踪中学习模型参数误差度量方法的比较
[信息传输、软件和信息技术服务业] [2015-04-09]
We compare several metrics, including log likelihood (LL), root mean squared error (RMSE), and area under the receiver operating characteristic curve (AUC), to evaluate which metric is most suited for this purpose. LL is commonly used as an error metric in Expectation Maximization (EM) to perform parameter estimation. RMSE and AUC have been suggested but have not been explored in depth. In order to examine the effectiveness of using each metric, we measure the correlations between the values calculated by each and the distances from the corresponding points to the ground truth. Additionally, we examine how each metric compares to the others. Our findings show that RMSE is significantly better than LL and AUC.
关键词:误差度量;知识追踪模型;贝叶斯;对数似然比;特征曲线;均方误差
-
1773.通过自动化元数据转换的便携式构建应用程序实现
[信息传输、软件和信息技术服务业] [2015-04-08]
We present a synthesis technique that learns how to transform a building’s primitive sensor metadata to a common namespace by using a small numb er of examples from an expert, such as the building manager. Once the transformation rules are learned for one building, it can be applied across buildings with a similar metadata structure.
关键词:自动化构建;元数据;传感器网络
-
1774.开放指令集:以RISC-V为例
[信息传输、软件和信息技术服务业] [2015-04-08]
We conclude that the industry would benefit from viable freely open ISAs just as it has benefited from free open source software. For example, it would enable a real free open market of processor designs, which patents on ISA quirks prevent.
关键词:指令集;RISC;RISC-V;处理器
-
1775.动态批量大小变化的自适应流处理算法
[信息传输、软件和信息技术服务业] [2015-04-08]
In this paper, we explore the effect of the size of batches on the performance of streaming workloads.The throughput and end-to-end latency of the system can have complicated relationships with batch sizes, data ingestion rates, variations in available resources, workload characteristics, etc. We propose a simple yet robust control algorithm that automatically adapts batch sizes as the situation necessitates.We show through extensive experiments that this algorithm is powerful enough to ensure system stability and low end-to-end latency for a wide class of workloads, despite large variations in data rates and operating conditions.
关键词:动态批量;自适应;流处理
-
1776.利用游戏时间进行数据依赖的时序分析与时序重复性分析
[信息传输、软件和信息技术服务业] [2015-04-08]
We present a technique for automatically learning a model of the data dependencies and encoding this model into the code under analysis for processing by GameTime. Using these extensions, we show that GameTime more accurately predicts the timing for a variety of benchmarks.Unfortunately, the complexity of modern architectures and platforms has made it very difficult to obtain accurate and efficient timing estimates. To deal with this, there have been recent proposals to re-architect platforms to make execution time of instructions more repeatable. There is however no systematic formalization of what timing repeatability means. In this thesis, we also propose formal models of timing repeatability. We give an algorithmic approach to evaluate parameters of these formal models. Using GameTime along with the data-dependent extensions discussed in this thesis, we objectively evaluate the timing repeatability of a representative sample of platforms with respect to a program of interest.
关键词:GameTime;数据依赖;时序分析;时序重复性
-
1777.混合联合仿真标准技术要求
[信息传输、软件和信息技术服务业] [2015-04-08]
This paper defines a suite of requirements for future hybrid cosimulation standards, and specifically provides guidance for development of a hybrid cosimulation version of the Functional Mockup Interface (FMI) standard. The paper defines a suite of test components, giving a mathematical model of an ideal behavior, plus a discussion of practical implementation considerations.
关键词:联合仿真;FMI;仿真模型
-
1778.大数据何时才足够大?来自基于GPS的旅游需求分析调查的启示
[信息传输、软件和信息技术服务业] [2015-04-08]
We use simulated datasets to compare performance across different sample sizes, inference accuracies and estimation methods. Findings from the simulated datasets are corroborated with real data collected from individuals living in the San Francisco Bay Area, United States. Results indicate that the benefits of using GPS-based surveys will vary significantly, depending upon the sample size of the data, the accuracy of the inference algorithm and the desired complexity of the travel demand model specification. In many cases, gains in the volume of data that can potentially be retrieved using GPS devices may be offset by the loss in quality caused by inaccuracies in inference.
关键词:大数据;旅游需求;GPS;模拟数据集
-
1779.分级规范的程序综合算法
[信息传输、软件和信息技术服务业] [2015-04-08]
We present grammar-modular (GM) synthesis, an algorithm for synthesis from tree-structured relational specifications. GM synthesis makes synthesis applicable to previously intractable relational specifications by decomposing them into smaller subproblems, which can be tackled in isolation by off-the-shelf synthesis procedures. The program fragments thus generated are subsequently composed to form a program satisfying the overall specification.We also generalize our technique to tree languages of relational specifications. Here, we synthesize a single program capable satisfying any (tree-shaped) relation belonging to the language; the synthesized program is syntax-directed by the structure of the relation.
关键词:关系规范;树形结构;程序综合
-
1780.计算图表与网络拓扑结合的竞争范围
[信息传输、软件和信息技术服务业] [2015-04-08]
We obtain a novel contention lower bound that is a function of the network and the computation graph parameters. To this end, we compare the communication bandwidth needs of subsets of processors and the available network capacity (as opposed to per-processor analysis in most previous studies).
关键词:网络拓扑;计算图表;网络链接;通信