关键词:大型多学习;矩阵分解树;数据分解
摘 要:Many big data applications require accurate classification of objects into one possibly thousands or millions of categories. These classification tasks are challenging for various reasons, including class imbalance, high testing cost, and model interpretability problems. To overcome these challenges, we propose a novel hierarchical classification method known as MF-Tree, which stands for matrix factorization tree. Unlike many of the existing methods, our approach is designed to optimize a global objective function. The key theoretical insight of this paper is demonstrating the equivalence between the proposed squared error loss function for matrix factorization and the Hilbert-Schmidt Independence Criterion (HSIC). We showed that the latter has an additive property, thus allowing us to decompose the multi-class learning problem into hierarchical binary classification tasks. To improve its training efficiency, an approximate algorithm for inducing MF-Tree is also proposed. We have performed extensive experiments to compare our methods against several state-of-the-art baseline algorithms. Experimental results suggest that our proposed methods are both effective and efficient when applied to real-world data sets.