关键词:模型训练;模型检索;集散系统;可伸缩
摘 要:Model search is a crucial component of data analytics pipelines, and this laborious process of choosing an appropriate learning algorithm and tuning its parameters remains a major obstacle in the widespread adoption of machine learning techniques. Recent efforts aiming to automate this process have assumed model training itself to be a black-box, thus limiting the effectiveness of such approaches on large-scale problems. In this work, we build upon these recent efforts. By inspecting the inner workings of model training and framing model search as bandit-like resource allocation problem, we present an integrated distributed system for model search that targets large-scale learning applications. We study the impact of our approach on a variety of datasets and demonstrate that our system, named GHOSTFACE, solves the model search problem with comparable accuracy as basic strategies but an order of magnitude faster. We further demonstrate that GHOSTFACE can scale to models trained on terabytes of data across hundreds of machines.