关键词:通信;并行算法;线性代数;通信下界;微分方程
摘 要:Communication, i.e., moving data, between levels of a memory hierarchy or between parallel processors on a network, can greatly dominate the cost of computation, so algorithms that minimize communication can run much faster (and use less energy) than algorithms that do not. Motivated by this, attainable communication lower bounds were established for a variety of algorithms including matrix computations. The lower bound approach used initially for Theta(N3) matrix multiplication, and later for many other linear algebra algorithms, depended on a geometric result by Loomis and Whitney: this result bounded the volume of a 3D set (representing multiply-adds done in the inner loop of the algorithm) using the product of the areas of certain 2D projections of this set (representing the matrix entries available locally, i.e., without communication). Using a recent generalization of Loomis' and Whitney's result, we generalize this lower bound approach to a much larger class of algorithms, that may have arbitrary numbers of loops and arrays with arbitrary dimensions as long as the index expressions are a ne combinations of loop variables.