通过基于模型的热区映射进行耐故障数据中心的冷却控制
Failure Resistant Data Center Cooling Control Through Model-Based Thermal Zone Mapping
关键词:数据中心;冷却;热区
摘 要:Due to the tremendous cooling costs, data center cooling efficiency improvement has been actively pursued for years. In addition to cooling efficiency, the reliability of the cooling system is also essential for guaranteed uptime. In traditional data center cooling system design with N+1 or higher redundancy, all the computer room air conditioning (CRAC) units are either constantly online or cycled according to a predefined schedule. Both cooling system configurations, however, have their respective drawbacks. Data centers are usually over provisioned when all CRAC units are online all the time, and hence the cooling efficiency is low. On the other hand, although cooling efficiency can be improved by cycling CRAC units and turning off the backups, it is difficult to schedule the cycling such that sufficient cooling provisioning is guaranteed and gross over provisioning is avoided. In this paper, we aim to maintain the data center cooling redundancy while achieving high cooling efficiency. Using model- based thermal zone mapping, we first partition data centers to achieve the desired level of cooling redundancy through zone overlap adjustment. We then design a distributed controller for each of the CRAC units to regulate the thermal status within its zone of influence. The distributed controllers coordinate with each other to achieve the desired data center thermal status using the least cooling power. When CRAC units or their associated controllers fail, racks in the affected thermal zones are still within the control "radius" of other decentralized cooling controllers through predefined thermal zone overlap, and hence their thermal status is properly managed by the active CRAC units and controllers. Using this failure resistant data center cooling control approach, both cooling efficiency and robustness are achieved simultaneously. A higher flexibility in cooling system maintenance is also expected, since the distributed control system can automatically adapt to the new cooling facility configuration incurred by maintenance.