行业报告详情 - 行业报告数据库

行业分类

找到报告 1 篇当前为第 1 页共 1 页

高效集体数据传输的应用级流量调度

Application-level Flow Scheduling for Efficient Collective Data Transfers

作者：Kumar, Vijay S.; Tucek, Joseph; Wylie, Jay J.; Krevat, Elie; Ganger, Gregory R. 作者单位：HP Laboratories 加工时间：2014-02-17 信息来源：HP

索取原文[7 页]

关键词：集体数据传输；数据流；数据洗牌
摘要：Collective data transfers among sets of processes over high-bandwidth, low-latency data center networks are an integral part of Big Data computations (e.g., the data shuffle in MapReduce). In this paper, we use a carefully architected microbenchmark that emulates a data shuffle, to gather network traces and perform detailed analysis. The key result of our analysis is that having more than two competing bi-directional flows per node in the transfer reduces throughput by 10%. What this means is that, even at very low cardinality (3- or 4-node shuffle), only 90%of the possible throughput can be achieved when commodity Ethernet-based switches are employed. TCP contention among multiple flows is the reason for the throughput loss experienced by collective data transfers. Though we identify system parameter configurations that minimize such packet losses, we believe application- layer flow management is necessary to circumvent this network-level problem. Towards this end, we designed and implemented a technique, Max2Flows, that generates and orchestrates a schedule of coordinated data exchange stages. Each stage limits the number of competing flows per node to two or fewer, thus avoiding negative network-level effects. Experimental results show that, when incorporated into our microbenchmark, Max2Flows can operate at ~99% of the peak throughput on a 1 Gigabit Ethernet network for small shuffles.

行业分类

友情链接

联系我们

QQ咨询

电话咨询

微信公众号

感谢访问