关键词:HPC框架;数据集;网络堆栈,;服务器软件
摘 要:The proliferation of large High-Performance Computingclusters executing computation-intensive jobs on largedata sets has made cluster power proportionality very important [13]. Despite publicly available traces showing that many clusters have a low average utilization,existing power-proportionality techniques have seen low adoption, a major reason being that these techniques require modifications to the existing cluster software and network stack, and do not address the reliability concerns that may arise during the course of server powercycling.We present Hypnos, a defensive power proportionality system which is unobtrusive, extensible and gracefully handles possible server software and hardware failures which may occur during server power-cycling. We deployed Hypnos on a 57-server production cluster. From a 21-day run, we obtained a 36% energy saving in spite of multiple server and network failures.