-
43391.在互联网时代的图论算法
[信息传输、软件和信息技术服务业] [2013-11-20]
This dissertation addresses a series of graph problems inspired by the computational issues with face with the Internet, a massive distributed network of autonomous agents. There are several levels to this problem. From a systems perspective, what can we do to facilitate computation over massive graphs? From a modeling perspective, what do natural graphs look like and what features are useful? From a game theoretic perspective, the graphs often represent individuals or systems with their own goals and agendas. Can we understand how these systems compete and when these competitions are fair or can be manipulated? These questions are addressed. For the first, we consider the problem of streaming graph partitioning and show it is feasible. For the second, we study the joint degree distribution of a graph and show it is combinatorially easy to work with. Finally, we address questions about tournament design and manipulation.
关键词:互联网;图论算法;分布式网络图
-
43392.信息理论中的长距离依赖模型
[信息传输、软件和信息技术服务业] [2013-11-20]
Long range dependence refers to stochastic processes for which correlations persist at much longer time scales as compared to traditional models. For such processes the central limit theorem does not in general hold, and the smoothing e ect of the law of large numbers takes more time to settle in. Such phenomena have been observed in many di erent elds including nancial time series, DNA sequences, network traffic and variable bit-rate video. The bursty nature and persistent correlation structure of long range dependent processes make them tough to control and predict in practice, and tough to analyze in theory. In this thesis we look at the origins of long range dependence through the use of Markov models.
关键词:长距离依赖;随机过程;可变比特率
-
43393.NetApp公司的自动支持分析
[信息传输、软件和信息技术服务业] [2013-11-20]
This project leverages the Autosupport data to gain insights into the production environment as well as the QA environment in terms of their relationships to each other. Using the K-Means algorithm and direct matching method, we have identified eight common customer configuration groups, top customer configurations not tested by any QA machines, and top QA machines not testing any customer configurations.
关键词:NetApp公司;自动支持分析;QA环境;大数据
-
43394.针对自然语言处理中的结构性问题的最佳检索算法
[信息传输、软件和信息技术服务业] [2013-11-20]
We will discuss both known and novel algorithms that can find the best path without considering all hyperedges in the hypergraph, and hence can speed up search without sacrificing search quality. We will provide simplified proofs of correctness for these algorithms. We also propose two novel algorithms that permit extraction of the k-best paths instead of the single best. We compare these approaches both against exhaustive search, and against approximate search techniques which speed up search by sacrificing optimality guarantees.
关键词:检索算法;自然语言处理;结构;检索速度
-
43395.Reviewably-Secure软件系统的语言和框架支持
[信息传输、软件和信息技术服务业] [2013-11-20]
My thesis is that languages and frameworks can and should be designed to make it easier for programmers to write reviewably secure systems. A system is reviewably secure if its security is easy for an experienced programmer to verify, given access to the source code. A security reviewer should be able, with a reasonable amount of effort, to gain confidence that such a system meets its stated security goals. This dissertation includes work on on language subsetting and web application framework design. It presents Joe-E, a subset of the Java programming language designed to enforce object-capability security, simplifying the task of verifying a variety of security properties by enabling sound, local reasoning. Joe-E also enforces determinism-by-default, which permits functionally-pure methods to be identified by their signature. Functional purity is a useful property that can greatly simplify the task of correctly implementing and reasoning about application code.
关键词:软件系统;信息安全;框架支持;源代码;语言子集
-
43396.细晶粒流量分析的HTTPS漏洞
[信息传输、软件和信息技术服务业] [2013-11-20]
In this thesis, we apply the pattern recognition and data processing strengths of machine learning to accomplish traffic analysis objectives. Traffic analysis relies on the use of observable features of encrypted traffic in order to infer plaintext contents. We apply a clustering technique to HTTPS encrypted traffic on websites covering medical, legal and financial topics and achieve accuracy rates ranging from 64% - 99% when identifying traffic within each website. The total number of URLs considered on each page ranged from 176 to 366. We present our results along with a justification of the machine learning techniques employed and an evaluation which explores the impact on accuracy of variations in amount of training data, number of clustering algorithm invocations, and convergence threshold. Our technique represents a significant improvement over previous techniques which have achieved similar accuracy, albeit with the aid of supporting assumptions simplifying traffic analysis. We examine these assumptions more closely and present results suggesting that two assumptions, browser cache configuration and selection of webpages for evaluation, can have considerable impact on analysis. Additionally, we propose a set of minimum evaluation standards for improved quality in traffic analysis evaluations.
关键词:流量分析;机器学习;HTTPS;漏洞;模式识别;数据处理
-
43397.识别使用地区
[信息传输、软件和信息技术服务业] [2013-11-20]
The success of modeling object viewpoints motivates us to tackle the generic variation problem through component models, where each component characterizes not only a particular viewpoint of objects, but also a particular subcategory or pose.Furthermore, our approach allows the transfer of inner-grained semantic information from the components, such as keypoint locations and segmentation masks.
关键词:识别地区;建模对象;语义信息
-
43398.有效性测试:预测和确认并发和分散的内存并行系统的并发漏洞
[信息传输、软件和信息技术服务业] [2013-11-20]
We explain in detail the design decisions and optimizations that were necessary to scale Active Testing to thousands of cores. We present extensions to UPC-Thrille that support hybrid memory models as well. We evaluate the effectiveness of Active Testing by running our tools on several Java and UPC benchmarks, showing that it can predict and confirm real concurrency bugs with low overhead. We demonstrate the scalability of Active Testing by running benchmarks with UPC-Thrille on large clusters with thousands of cores.
关键词:大型集群;主动测试;并行系统;内存;并发;分散
-
43399.众包公民科学中使用移动技术和社交网络
[信息传输、软件和信息技术服务业] [2013-11-20]
This dissertation explores the application of computer science methodologies, techniques, and technologies to citizen science. Citizen science can be broadly de ned as scienti c research performed in part or in whole by volunteers who are not professional scientists. Such projects are increasingly making use of mobile and Internet technologies and social networking systems to collect or categorize data, and to coordinate efforts with other participants. The dissertation focuses on observations and experiences from the design, deployment, and testing of a citizen science project, CreekWatch. CreekWatch is a collaboration between an HCI research group and a government agency.
关键词:众包;互联网技术;社交网络
-
43400.一个优化的视频点播系统:理论,设计和实施
[信息传输、软件和信息技术服务业] [2013-11-20]
We show that by storing only a fractional of the entire catalog everywhere, the system is able to fully support user demand at large scale. Second, we develop a Markov approximation technique to solve the problem of topology selection under node degree bound using a simple distributed algorithm. We prove that our algorithm achieves close-to-optimal solution, which we verify using extensive realworld trace simulations. On the system side, we show extensive results to test the algorithm's scalability and robustness to changes in user dynamics and demand patterns. We show that our solution achieves high utilization of cache nodes storage and bandwidth resources, and automatically learns and caches the video according to the demand patterns. We observe that there exists a complex interplay between disk space, network bandwidth and node degree bound. We also present guidelines to important practical design choices including caching update intervals, demand prediction and provisioning. We also demonstrate the feasibility and efficiency of our design choice by building and experimenting a prototype system at Berkeley.
关键词:拓扑选择;算法;用户动态需求;视频点播系统