强大的矩阵乘法算法和扩展Memory-Independent沟通下界
Strong Scaling of Matrix Multiplication Algorithms and Memory-Independent Communication Lower Bounds
关键词:处理器;快速矩阵乘法;通信成本
摘 要:A parallel algorithm has perfect strong scaling if its running time on P processors is linear in 1=P, including all commu-nication costs. Distributed-memory parallel algorithms for matrix multiplication with perfect strong scaling have only recently been found. One is based on classical matrix multi-plication (Solomonik and Demmel, 2011), and one is based on Strassen's fast matrix multiplication (Ballard, Demmel, Holtz, Lipshitz, and Schwartz, 2012). Both algorithms scale perfectly, but only up to some number of processors where the inter-processor communication no longer scales.