-
基于自调整领域特定嵌入式语言的生产性高性能并行编程
As the complexity of machines and architectures has increased, performance tuning has become more challenging, leading to the failure of general compilers to generate the best possible optimized code. Expert performance programmers can often hand-write code that outperforms compiler-optimized low-level code by an order of magnitude. At the same time, the complexity of programs has also increased, with modern programs built on a variety of abstraction layers to manage complexity, yet these layers hinder efforts at optimization. In fact, it is common to lose one or two additional orders of magnitude in performance when going from a low-level language such as Fortran or C to a high-level language like Python, Ruby, or Matlab. General purpose compilers are limited by the inability of program analysis to determine programmer intent, as well as the lack of detailed performance models that always determine the best executable code for a given computation and architecture. The latter problem can be mitigated through auto-tuning, which generates many code variants for a particular problem and empirically determines which performs best on a given architecture. This thesis addresses the problem of how to write programs at a high level while obtaining the performance of code written by performance experts at the low level. To do so, we build domain-specific embedded languages that generate low-level parallel code from a high-level language, and then use auto-tuning to determine the best performing low-level code. Such DSELs avoid analysis by restricting the domain while ensuring programmers specify high-level intent, and by performing empirical auto-tuning instead of modeling machine parameters. As a result, programmers write in high-level languages with portions of their code using DSELs, yet obtain performance equivalent to the best hand-optimized low-level code, across many architectures. We present a methodology for building such auto-tuned DSELs, as well as a software infrastructure and example DSELs using the infrastructure, including a DSEL for structured grid computations and two DSELs for graph algorithms. The structured grid DSEL obtains over 80% of peak performance for a variety of benchmark kernels across different architectures, while the graph algorithm DSELs mitigate all performance loss due to using a high-level language. Overall, the methodology, infrastructure, and example DSELs point to a promising new direction for obtaining high performance while programming in a high-level language.
-
空白电视信号频段在无线话筒频带、37频带以及即将形成的防护频带中的运用
This report is intended as a response to some of the questions posed by the FCC regarding the upcoming TV-band incentive auction, given in their NPRM, as they relate to the television whitespaces. In particular, we argue (1) that channel 37 should be made available for whitespace use; (2) that the channels reserved for wireless microphones should be reserved on an as-used basis only; and (3) that the guard bands which will be created via the incentive auction must be considered as database-registration-requiring whitespace if unlicensed devices are authorized to use them. These three proposals have two common themes: (1) they each work toward the goal of making otherwise-wasted spectrum available as whitespace; and (2) in each case, the key concept is that the involved parties can (and in some cases must) register their devices and use geolocation of some sort. We will sketch each of our proposals and show how together they can make whitespace available for up to 10 million more Americans with minimal overhead while ensuring that licensed users receive the quality of service that they expect. As a result, essentially no one would be left without whitespace access.
-
我们是如何陷入了这个烂摊子?隔离故障导致输入SDN控制软件
Software bugs are inevitable in software-defined networking (SDN) control planes, and troubleshooting is a tedious, time-consuming task. In this paper we discuss how one might improve SDN network troubleshooting by presenting a technique, retrospective causal inference, for automatically identifying a minimal sequence of inputs responsible for triggering a given bug in the control software. Retrospective causal inference works by iteratively pruning inputs from the history of the execution, and coping with divergent histories by reasoning about the functional equivalence of events. We apply retrospective causal inference to three open source SDN control platforms---Floodlight, POX, and NOX---and illustrate how our technique found minimal causal sequences for the bugs we encountered.
-
分布式记忆广度优先搜索再现:启用自下而上的搜索
Breadth-first search (BFS) is a fundamental graph primitive frequently used as a building block for many complex graph algorithms. In the worst case, the complexity of BFS is linear in the number of edges and vertices, and the conventional top-down approach always takes as much time as the worst case.A recently discovered bottom-up approach manages to cut down the complexity all the way to the number of vertices in the best case, which is typically at least an order of magnitude less than the number of edges. The bottom-up approach is not always advantageous, so it is combined with the top-down approach to make the direction-optimizing algorithm which adaptively switches from top-down to bottom-up as the frontier expands.We present a scalable distributed-memory parallelization of this challenging algorithm and show up to an order of magnitude speedups compared to an earlier purely top-down code. Our approach also uses a 2D decomposition of the graph that has previously been shown to be superior to a 1D decomposition.Using the default parameters of the Graph500 benchmark, our new algorithm achieves a performance rate of over 240 billion edges per second on 115 thousand cores of a Cray XE6, which makes it over 7 faster than a conventional top-down algorithm using the same set of optimizations and data distribution.
-
云机器人和自动化:相关工作调查
What if robots and automation systems were not limited by onboard computation, memory, or programming? This is now practical with wireless networking and rapidly expanding Internet resources. In 2010, James Kuffner at Google introduced the term “Cloud Robotics" to describe a new approach to robotics that takes advantage of the Internet as a resource for massively parallel computation and real time sharing of vast data resources. The Google autonomous driving project exemplifes this approach: the system indexes maps and images that are collected and updated by satellite, Streetview, and crowdsourcing from the network to facilitate accurate localization. Another example is Kiva Systems new approach to warehouse automation and logistics using large numbers of mobile platforms to move pallets using a local network to coordinate planforms and update tracking data. These are just two new projects that build on resources from the Cloud. Steve Cousins of Willow Garage aptly summarized the idea: “No robot is an island." Cloud Robotics recognizes the wide availability of networking, incorporates elements of open-source, open-access, and crowdsourcing to greatly extend earlier concepts of “Online Robots" and “Networked Robots". Cloud Robotics has potential to improve robot performance in at least five ways: 1) Big Data: indexing a global library of images, maps, and object data, 2) Cloud Computing: parallel grid computing on demand for statistical analysis, learning, and motion planning, 3) Open-Source / Open-Access: humans sharing code, data, algorithms, and hardware designs, 4) Collective Robot Learning: robots sharing trajectories, control policies, and outcomes, and 5) Crowdsourcing and call centers: offline and on-demand human guidance for evaluation, learning, and error recovery. This article surveys related work as of Fall 2012.
-
算法的随机性和复杂性
This dissertation explores the multifaceted interplay between efficient computation and prob-ability distributions. We organize the aspects of this interplay according to whether the randomness occurs primarily at the level of the problem or the level of the algorithm, and orthogonally according to whether the output is random or the input is random.