关键词:通信模式;矩阵乘法;递推算法;数字线性代数;分布式计算;
摘 要:Matrix multiplication is one of the most fundamental algorithmic problems in numerical linear algebra, distributed computing, scienti c computing, and high-performance computing. Parallelization of matrix multiplication has been extensively studied (e.g., [21, 12, 24, 2, 51, 39, 36, 23, 45, 61]). It has been addressed using many theoretical approaches, algorithmic tools, and software engineering methods in order to optimize performance and obtain faster and more ecient parallel algorithms and implementations. To design ecient parallel algorithms, it is necessary not only to load balance the computation, but also to minimize the time spent communicating between processors.