-
31.协约:千核和NVRAM的OLTP引擎
[信息传输、软件和信息技术服务业] [2015-10-01]
Server hardware is about to drastically change. As typified by emerging hardware such as UC Berkeley's Firebox project and by Intel's Rack-Scale Architecture (RSA), next generation servers will have \emph{thousands of cores}, \emph{large DRAM}, and \emph{huge NVRAM}. We analyze the characteristics of these machines and find that no existing database is appropriate. Hence, we are developing \texttt{FOEDUS}, an open-source, from-scratch database engine whose architecture is drastically different from traditional databases. It extends in-memory database technologies to further scale up and also allows transactions to efficiently manipulate data pages in both DRAM and NVRAM.
关键词:数据库引擎;服务器;燃烧室
-
32.元阻塞监督
[信息传输、软件和信息技术服务业] [2015-10-01]
Entity Resolution matches mentions of the same entity. Being an expensive task for large data, its performance can be improved by blocking, i.e., grouping similar entities and comparing only entities in the same group. Blocking improves the run-time of Entity Resolution, but it still involves unnecessary comparisons that limit its performance. Meta-blocking is the process of restructuring a block collection in order to prune such comparisons. Existing unsupervised meta-blocking methods use simple pruning rules, which offer a rather coarse-grained filtering technique that can be conservative (i.e., keeping too many unnecessary comparisons) or aggressive (i.e., pruning good comparisons). In this work, we introduce supervised meta-blocking techniques that learn classification models for distinguishing promising comparisons. For this task, we propose a small set of generic features that combine a low extraction cost with high discriminatory power. We show that supervised meta-blocking can achieve high performance with small training sets that can be manually created. We analytically compare our supervised approaches with baseline and competitor methods over 10 large-scale datasets, both real and synthetic.
关键词:阻塞;实体解析;监督
-
33.学习助理:一种新的学习资源推荐系统
[信息传输、软件和信息技术服务业] [2015-10-01]
Reading online content for educational, learning, training or recreational purposes has become a very popular activity. While reading, people may have difficulty understanding a passage or wish to learn more about the topics covered by it, hence they may naturally seek additional or supplementary resources for the particular passage. These resources should be close to the passage both in terms of the subject matter and the reading level. However, using a search engine to find such resources interrupts the reading flow. It is also an inefficient, trial-and-error process because existing web search and recommendation systems do not support large queries, they do not understand semantic topics, and they do not take into account the reading level of the original document a person is reading. In this paper, we present LearningAssistant, a novel system that enables online reading material to be smoothly enriched with additional resources that can supplement or explain any passage from the original material for a reader on demand. The system facilitates the learning process by recommending learning resources (documents, videos, etc) for selected text passages of any length. The recommended resources are ranked based on two criteria (a) how they match the different topics covered within the selected passage, and (b) the reading level of the original text where the selected passage comes from. User feedback from students who use our system in two real pilots, one with a high school and one with a university, for their courses suggest that our system is promising and effective.
关键词:学习资源;推荐系统;阅读器
-
34.3D印刷玻璃:作为印刷过程的函数表面光洁度和综合性质
[信息传输、软件和信息技术服务业] [2015-10-01]
It is impossible to print glass directly from a melt, layer by layer. Glass is not only very sensitive to temperature gradients between different layers but also to the cooling process. To achieve a glass state the melt, has to be cooled rapidly to avoid crystallization of the material and then annealed to remove cooling induced stress. In 3D-printing of glass the objects are shaped at room temperature and then fired. The material properties of the final objects are crucially dependent on the frit size of the glass powder used during shaping, the chemical formula of the binder and the firing procedure. For frit sizes below 250 μm, we seem to find a constant volume of pores of less than 5%. Decreasing frit size leads to an increase in the number of pores which then leads to an increase of opacity. The two different binders, 2-hydroxyethyl cellulose and carboxymethylcellulose sodium salt, generate very different porosities. The porosity of samples with 2-hydroxyethyl cellulose is similar to frit-only samples, whereas carboxymethylcellulose sodium salt creates a glass foam. The surface finish is determined by the material the glass comes into contact with during firing.
关键词:3D印刷玻璃;印刷过程;表面光洁度
-
35.数据中心布线算法
[信息传输、软件和信息技术服务业] [2015-10-01]
Datacenter topology design is a complex problem with a huge search space. Recent work on systematic solution space exploration points out a significant problem on cabling: how to map a logical topology with servers, switches, and links onto a physical space with racks and cable trays such that the cable costs are minimized? In this paper, we show that this problem is NP-hard and present partitioning-based heuristics. Evaluation with different topologies demonstrate that our approach discovers better layouts than the previous approaches and reduces the cabling costs by up to 38%.
关键词:算法设计;数据分析;数据库
-
36.相变介质栈的内存功率降低稳态导热系数测量
[信息传输、软件和信息技术服务业] [2015-09-30]
Phase-change memory (PCM) devices require lower write power to be competitive with other memory devices. A promising method to decrease the write power required for switching is to localize heating and thus develop thermally confined devices. In this regard, it is increasingly necessary to reduce the thermal conductivity of the dielectric layer used in these device structures. In this work, we investigate the temperature-dependent thermal conductivities of alternating stacks of thin-film amorphous dielectrics, specifically SiO2/Al2O3 and SiO2/Si3N4. Experiments were performed using steady-state Joule-heating and electrical thermometry, while using a micro-miniature refrigerator (MMR) over a wide temperature range (100 K - 500 K). The measurements show that the amorphous thin-film stacks exhibit effective out-of-plane room temperature thermal conductivities of about 1.14 and 0.48 W / (m x K), respectively. Both of these values are lower than bulk thermal conductivities of their constituent films. Molecular Dynamics (MD) simulations show that increased scattering at the boundary between layers, and not acoustic mismatch, is the source of the increased resistance for these thin-films. Additional Finite-Element (FE) simulations show that the primary heat loss path for dash-type cells is through the dielectric and that the SiO2/Al2O3 stacked dielectric films improve PCM cell heating by 42%.
关键词:内存功率;稳态导热系数;介质栈
-
37.非易失性内存持久化编程模型
[信息传输、软件和信息技术服务业] [2015-09-30]
It is expected that DRAM memory will be augmented, and eventually replaced, by one of several up-and-coming memory technologies, all of which are non-volatile, in that they retain their contents without power. This allows primary memory to be used as a fast disk replacement. It also enables more aggressive programming models that directly leverage persistence of primary memory. However, it is challenging to maintain consistency of memory in such an environment. There is no consensus on the right programming model for doing so, and subtle differences can have large, and sometimes surprising, effects on the implementation and its performance. The existing literature describes several programming systems that provide selective persistence for user data structures. We more carefully and precisely describe the semantics of those systems, and thus the associated programming rules. We expose subtle and generally ignored trade-offs of programming generality vs implementation difficulty, as well as additional interesting points in the design space.
关键词:非易失性内存;交易;一致性
-
38.低延迟时间序列分析的数据转换使用
[信息传输、软件和信息技术服务业] [2015-09-30]
Time series analysis is commonly used when monitoring data centers, networks, weather, and even human patients. In most cases, the raw time series data is massive, from millions to billions of data points, and yet interactive analyses require low (e.g., sub-second) latency. Aperture transforms raw time series data, during ingest, into compact summarized representations that it can use to efficiently answer queries at runtime. Aperture handles a range of complex queries, from correlating hundreds of lengthy time series to predicting anomalies in the data. Aperture achieves much of its high performance by executing queries on data summaries, while providing a bound on the information lost when transforming data. By doing so, Aperture can reduce query latency as well as the data that needs to be stored and analyzed to answer a query. Our experiments on real data show that Aperture can provide one to four orders of magnitude lower query response time, while incurring only 10% ingest time overhead and less than 20% error in accuracy.
关键词:数据转换;时间序列;数据分析
-
39.由PAM-4编码载体注入调节器的数据速率增强双硅环谐振器
[信息传输、软件和信息技术服务业] [2015-09-30]
Two rings are thermally tuned to near-resonance and individually modulated by two uncorrelated, NRZ encoded PRBS data sources at 1 Gb/s at different extinction ratios to achieve a 2 GBd/s. Experimental data is shown as a proof-of-concept for this approach.
关键词:光通信;光互联;调制;PAM
-
40.通过粘性路由改善交叉数据的多节点重复数据删除性能
[信息传输、软件和信息技术服务业] [2015-09-30]
High capacity, high throughput, chunk-based inline deduplication systems for backup have been commercially successful, but scaling them out has proved challenging. In such multi-node systems, the data needs to be routed at a large enough granularity to sustain locality at the back ends. Two routing algorithms, Min Hash and Auction, have been put forth for this purpose. We demonstrate that these algorithms perform poorly on interleaved data. Interleaved data occurs when multiple streams are multiplexed into a single high-speed stream to speed up backups. Of particular commercial importance, database backup procedures produce such interleaved data, where multiple threads read database files in parallel. We present a new routing algorithm, Sticky Auction routing, that, unlike existing algorithms, handles interleaved data with little deduplication loss. It also achieves comparable or better deduplication performance for non-interleaved data and good load balancing, especially when multiple streams are used, the typical case.
关键词:路由器性能;交叉数据;节点