行业报告详情 - 行业报告数据库

行业分类

找到报告 1 篇当前为第 1 页共 1 页

Shark:：SQL和大规模地丰富分析

Shark: SQL and Rich Analytics at Scale

作者：Reynold Shi Xin Joshua Rosen Matei Zaharia Michael Franklin Scott Shenker Ion Stoica 作者单位：University of California at Berkeley 加工时间：2013-11-18 信息来源：EECS

索取原文[13 页]

关键词：Shark；数据分析系统；内存抽象；大型集群分析；故障容限性能
摘要：Shark is a new data analysis system that marries query processing with complex analytics on large clusters. It leverages a novel distributed memory abstraction to provide a unified engine that can run SQL queries and sophisticated analytics functions (e.g., iterative machine learning) at scale, and efficiently recovers from failures mid-query. This allows Shark to run SQL queries up to 100× faster than Apache Hive, and machine learning programs up to 100× faster than Hadoop. Unlike previous systems, Shark shows that it is possible to achieve these speedups while retaining a MapReduce-like execution engine, and the fine-grained fault tolerance properties that such engines provide. It extends such an engine in several ways, including column-oriented in-memory storage and dynamic mid-query replanning, to effectively execute SQL. The result is a system that matches the speedups reported for MPP analytic databases over MapReduce, while offering fault tolerance properties and complex analytics capabilities that they lack.

行业分类

友情链接

联系我们

QQ咨询

电话咨询

微信公众号

感谢访问