-
43271.百万癌症基因库
[信息传输、软件和信息技术服务业,计算机、通信和其他电子设备制造业,医药制造业] [2013-11-18]
This whitepaper shows that it is now technically possible to reliably store and analyze 1 million genomes and related clinical and pathological data, which would match the demand for 2014. Moreover, thanks to advances in cloud computing, it is surprisingly affordable: multiple estimates agree on a technology cost of about $25 a year per genome.While the focus is on technology, to be thorough, this whitepaper touches on high-level policy issues as well as low-level details about statistics and the price of computer memory to cover the scope of the issues that a million cancer genome warehouse raises.
关键词:基因组测序;基因组;生物医学;内存覆盖;云计算;软件;硬件
-
43272.双语语料库的语法协议
[信息传输、软件和信息技术服务业] [2013-11-18]
The task of automatic machine translation (MT) is the focus of a huge variety of active research efforts, both because of the intrinsic utility of this difficult task, and the theoretical and linguistic insights that arise from modeling relationships between natural languages.This thesis investigates the problem of finding syntactic parse trees of target and/or source sentences that are more appropriate for use in a syntactic MT system.
关键词:自动机器翻译(MT);;建模;单语解析器;自然语言;语法解析树
-
43273.Shark::SQL和大规模地丰富分析
[信息传输、软件和信息技术服务业] [2013-11-18]
Shark is a new data analysis system that marries query processing with complex analytics on large clusters. It leverages a novel distributed memory abstraction to provide a unified engine that can run SQL queries and sophisticated analytics functions (e.g., iterative machine learning) at scale, and efficiently recovers from failures mid-query. This allows Shark to run SQL queries up to 100× faster than Apache Hive, and machine learning programs up to 100× faster than Hadoop. Unlike previous systems, Shark shows that it is possible to achieve these speedups while retaining a MapReduce-like execution engine, and the fine-grained fault tolerance properties that such engines provide. It extends such an engine in several ways, including column-oriented in-memory storage and dynamic mid-query replanning, to effectively execute SQL. The result is a system that matches the speedups reported for MPP analytic databases over MapReduce, while offering fault tolerance properties and complex analytics capabilities that they lack.
关键词:Shark;数据分析系统;内存抽象;大型集群分析;故障容限性能
-
43274.激励,计算,和网络:算法机制设计的局限性和可能性
[信息传输、软件和信息技术服务业] [2013-11-18]
In the past decade, a theory of manipulation-robust algorithms has been emerging toaddress the challenges that frequently occur in strategic environments such as the internet.The theory, known as algorithmic mechanism design, builds on the foundations of classicalmechanism design from microeconomics and is based on the idea of incentive compatible protocols.Such protocols achieve system-wide objectives through careful design that ensuresit is in every agent's best interest to comply with the protocol. As it turns out, however,implementing incentive compatible protocols as advocated in classical mechanism design theoryoften necessitates solving intractable problems. To address this, algorithmic mechanism design focuses on designing computationally-feasible incentive compatible approximation algorithms.
关键词:操作算法理论;互联网战略环境;算法机制设计;激励兼容协议
-
43275.LU分解的面板排名提示和沟通避免版本
[信息传输、软件和信息技术服务业] [2013-11-18]
We present the LU decomposition with panel rank revealing pivoting (LU PRRP), an LU factorization algorithm based on strong rank revealing QR panel factorization. Our extensive numerical experiments show that the new factorization scheme is as numerically stable as GEPP in practice, but it is more resistant to pathological cases and easily solves the Wilkinson matrix and the Foster matrix. The LU PRRP factorization does onlyO(n2b) additional floating point operations compared to GEPP.We also present CALU PRRP, a communication avoiding version of LU PRRP that minimizes communication. CALU PRRP is based on tournament pivoting, with the selection of the pivots at each step of the tournament being performed via strong rank revealing QR factorization.
关键词:LU分解;QR分解;数值实验;GEPP
-
43276.基于点阵本体的静态模型分析
[信息传输、软件和信息技术服务业] [2013-11-18]
This thesis demonstrates a correct, scalable and automated method to infer semantic concepts using lattice-based ontologies, given relatively few manual annotations. Semantic concepts and their relationships are formalized as a lattice, and relationships within and between program elements are expressed as a set of constraints. Our inference engine auto- matically infers concepts wherever they are not explicitly specified. Our approach is general, in that our framework is agnostic to the semantic meaning of the ontologies that it uses. Where practical use-cases and principled theory exist, we provide for the expression of infinite ontologies and ontology compositions. We also show how these features can be used to express of value-parametrized concepts and structured data types. In order to help find the source of errors, we also present a novel approach to debugging by showing simplified errors paths. These are minimal subsets of the constraints that fail to type-check, and are much more useful than previous approaches in finding the cause of program bugs. We also present examples of how this analysis tool can be used to express analyses of abstract interpretation; physical dimensions and units; constant propagation; and checks of the monotonicity of expressions.
关键词:本体;语义概念;程序元素
-
43277.Midas:原型交互对象的电容触摸传感器定制
[信息传输、软件和信息技术服务业,计算机、通信和其他电子设备制造业] [2013-11-18]
An increasing number of consumer products include user interfaces that rely on touch input. While digital fabrication techniques such as 3D printing make it easier to prototype the shape of custom devices, adding interactivity to such prototypes remains a challenge for most designers. We introduce Midas, a software and hardware toolkit to support the design, fabrication, and programming of flexible capacitive touch sensors for interactive objects. With Midas, designers first define the desired shape, layout, and type of touch sensitive areas in a sensor editor interface. From this high-level specification, Midas automatically generates layout files with appropriate sensor pads and routed connections. These files are then used to fabricate sensors using digital fabrication processes, e.g. vinyl cutters and circuit board printers. Using step-by-step assembly instructions generated by Midas, designers connect these sensors to our microcontroller setup, which detects touch events. Once the prototype is assembled, designers can define interactivity for their sensors: Midas supports both record-and-replay actions for controlling existing local applications and WebSocket-based event output for controlling novel or remote applications. In a first-use study with three participants, users successfully prototyped media players. We also demonstrate how Midas can be used to create a number of touch-sensitive interfaces.
关键词:用户界面;触摸输入;数字制造技术;3D打印技术;Midas;软件;硬件工具包;电容触摸传感器;交互对象
-
43278.重新优化数据并行计算
[信息传输、软件和信息技术服务业] [2013-11-18]
Performant execution of data-parallel jobs needs good executionplans. Certain properties of the code, the data, and the interaction between them are crucial to generate these plans. Yet, these propertiesare dicult to estimate due to the highly distributed nature of theseframeworks, the freedom that allows users to specify arbitrary code,as operations on the data, and since jobs in modern clusters haveevolved beyond single map and reduce phases to logical graphs ofoperations. Using xed apriori estimates of these properties to chooseexecution plans, as modern systems do, leads to poor performance inseveral instances. We present RoPE, a rst step towards re-optimizingdata-parallel jobs. RoPE collects certain code and data properties bypiggybacking on job execution. It adapts execution plans by feedingthese properties to a query optimizer. We show how this improves thefuture invocations of the same (and similar) jobs and characterize thescenarios of benet. Experiments on Bing's production clusters showup to 2 improvement across response time for production jobs at the75th percentile while using 1:5 fewer resources.
关键词:优化数据并行计算;查询优化器;高效执行
-
43279.视觉定位的非线性、半定编程松弛
[信息传输、软件和信息技术服务业] [2013-11-18]
We consider the problem of estimating the locations of a set of points in a k-dimensional euclidean space given a subset of the pairwise distance measurements between the points. We define a notion of non-contractibility and show that the relaxation gives the exact point locations when the underlying graph is non-contractible. The performance of the algorithm is evaluated on an experimental data set obtained from a network of 44 nodes in an indoor environment and is shown to be robust to non-line-of-sight errors.
关键词:视觉定位;半定编程;矩阵分解
-
43280.减少安卓通信应用中的表面攻击
[信息传输、软件和信息技术服务业] [2013-11-17]
The complexity of Android's message-passing system has led to numerous vulnerabilities in third-party applications. Many of these vulnerabilities are a result of developers con- fusing inter-application and intra-application communication mechanisms. Consequently, we propose modi cations to the Android platform to detect and protect inter-application mes- sages that should have been intra-application messages.
关键词:安卓平台;应用程序;安全漏洞;信息安全;消息传递