-
一个优化的分布式视频点播流媒体系统:理论与设计
In this report, we present a general framework for a distributed Video-on-Demand content distribution problem by formulating an optimization problem yielding a highly distributed implementation that is highly scalable and resilient to changes in demand. Our solution takes into account several individual node resource constraints including disk space, network link bandwidth, and node-I/O degree bound.
-
数字出版艺术:支撑出版业未来的一种组合标准的基础
Scienti c content increasingly relies on the presentation and authoring of complex multimedia diagrams and gures, sometimes interactive, to convey information in a non-textual way. Wikis and user-generated hyper-linked content have both been very successful in the case for text|this is what we aim to do for mathematical diagrams. Many professors in higher education who write textbooks know TeX, however, they don't often know how to program the Web. The future of building interactive user interfaces should lie not in the hands of programmers, but in the hands of the expert of a given eld|the goal of this project is to supply math, physics, and engineering professors with a platform to express mathematical concepts to students to provide immersive learning environments.
-
现场鼓表演的机器理解技术
This dissertation covers machine listening techniques for the automated realtime analysis of live drum performances. Onset detection, drum detection, beat tracking, and drum pattern analysis are combined into a system that provides rhythmic information useful in performance analysis, synchronization, and retrieval. The techniques are designed with real-time use in mind but can easily be adapted for offline batch use for large scale rhythm analysis.
-
可扩展机器学习的随机算法
Many existing procedures in machine learning and statistics are computationally intractable in the setting of large-scale data. As a result, the advent of rapidly increasing dataset sizes, which should be a boon yielding improved statistical performance, instead severely blunts the usefulness of a variety of existing inferential methods. In this work, we use randomness to ameliorate this lack of scalability by reducing complex, computationally dicult inferential problems to larger sets of signi cantly smaller and more tractable subproblems.This approach allows us to devise algorithms which are both more ecient and more amenable to use of parallel and distributed computation. We propose novel randomized algorithms for two broad classes of problems that arise in machine learning and statistics;estimator quality assessment and semide nite programming.
-
一种可实现对话分析的高精度,低延迟,可扩展的麦克风阵列系统
Understanding and facilitating real-life social interaction is a high-impact goal for ubiquitous computing research. Microphone arrays offer the unique capability to provide continuous, calm capture of verbal interaction in large physical spaces, such as homes and especially open-plan offices. Most microphone array work has focused on arrays of custom sensors in small spaces, and a few recent works have tested small arrays of commodity sensors in single rooms. This thesis describes the first working scalable and cost-effective array infrastructure that offers highprecision localization of conversational speech, and hence enables ongoing studies of verbal interactions in large semi-structured spaces.
-
云端大数据
Big data is an inherent feature of the cloud and provides unprecedented opportunities to use both traditional, structured database information and business analytics with social networking, sensor network data, and far less structured multimedia. Big data applications require a data- centric compute architecture, and many solutions include cloud-based APIs to interface with advanced columnar searches, machine learning algorithms, and advanced analytics such as computer vision, video analytics, and visualization tools. This article examines the use of the R language and similar tools for big data analysis and methods to scale big data services in the cloud. It provides an in-depth look at digital photo management as a simple big data service that employs key elements of search, analytics, and machine learning applied to unstructured data.
-
文本摘要的凸方法
This dissertation presents techniques for the summarization and exploration of text documents. Many approaches taken towards analysis of news media can be analogized to well-de ned, well-studied problems from statistical machine learning. The problem of feature selection, for classi cation and dimensionality reduction tasks, is formulated to help assist with these media analysis tasks. Taking advantage of `1 regularization, convex programs can be used to eciently solve these feature selection problems eciently. There is a demon-strated potential to conduct media analysis at a scale commensurate with the growing volume of data available to news consumers.
-
安全性分析:对一个组织的事件管理流程的风险分析
This document is an example of the type of report an organisation would receive at the end of a HP Security Analytics engagement. The focus is on the analysis of the security risks and performance of the organisation's Security Incident Management Processes and related Security Operation Centre (SOC)'s activities. HP Labs carried out the underlying R&D work in collaboration with HP Enterprise Security Services (HP ESS) and involved analysis of processes, probabilistic modeling, simulation and "what-if" analysis for some of HP's key customers. The outcome of this was a set of case studies from which we have been able to create this more general anonymised report illustrating the richness of the risk assessment and "what-if" analysis that has been carried out. The lifecycle management of security is critical for organisations to protect their key assets, ensure a correct security posture and deal with emerging risks and threats. It involves various steps, usually carried out on an ongoing, regular basis, including: risk assessment; policy definition; deployment of controls within the IT infrastructure; monitoring and governance. In this context, Security Information & Events Management (SIEM) solutions play a key role. Even the best information security practices and investments in security controls cannot guarantee that intrusions – accidental and criminal activities – and/or other malicious acts will not happen. Controls can fail, be bypassed or become inadequate over time; new threats emerge. Managing such incidents requires detective and corrective controls to minimise adverse impacts, gather evidence, and learn from previous situations in order to improve over time. These incident management processes are usually run in the context of a SOC and/or as part of specialised Computer Security Incident Response Teams (CSIRTS), built on top of SOCs. Even with SIEM solutions in place, a potential major risk for the organisation arises due to delays introduced in assessing and handling known incidents: this may postpone the successful resolution of critical security incidents (e.g. devices exposed on the Internet, exploitation of privileged accounts, deployed malware, etc.) and allow for further exploitation. Another related risk can be introduced by sudden and/or progressive changes of the threat landscape, due to changing economic and social scenarios, new business activities or process failings within the existing IT services. This might create unexpected volumes of new events and alerts to be processed by the security team and as such, introduce additional delays. Hence, it is important for an organisation to understand the risk exposure due to their Incident Management processes, explore potential future scenarios (e.g. changes in available resources or threats landscapes or adoption of Cloud solutions) and identify suitable ways to address related issues, e.g. by introducing process changes and/or making investments in security controls
-
在哺乳动物DNA甲基化研究中的统计算法
DNA methylation is a dynamic chemical modification that is abundant on DNA sequences and plays a central role in the regulatory mechanisms of cells. This modification can be inherited across cell divisions and generations, providing a “memory mechanism” for regulatory programs that is more flexible than that coded in the DNA sequence. In recent years, high-throughput sequencing technologies have enabled genome-wide annotation of DNA methylation. Coupled with novel computational machinery, these developments have enabled unperceivable insight to the characteristics,biological function and disease association of this phenomenon.
-
为实现可扩展性和灵活性的多代理集群调度
This dissertation presents a taxonomy and evaluation of three cluster scheduling architectures for scalability and exibility using a common high level taxonomy of cluster scheduling, a Monte Carlo simulator, and a real system implementation. We begin with the popular Monolithic State Scheduling (MSS), then consider two new architectures: Dynamically Partitioned State Scheduling (DPS) and Replicated State Scheduling (RSS). We describe and evaluate DPS, which uses pessimistic concurrency control for cluster resource sharing. We then present the design, implementation, and evaluation of Mesos, a real-world DPS cluster scheduler that allows diverse cluster computing frameworks to eciently share resources.