关键词:正交三角矩阵分解;高窄正交三角矩阵分解;householder矢量;householder-QR分解;算法性能成本模型
摘 要:We show how to perform TSQR and then reconstruct the Householder vector representation with the same asymptotic communication e_ciency and little extra computational cost. We demonstrate the high performance and numerical stability of this algorithm both theoretically and empirically. The new Householder reconstruction algorithm allows us to design more e_cient parallel QR algorithms, with signi_cantly lower latency cost compared to Householder QR and lower bandwidth and latency costs compared with Communication-Avoiding QR (CAQR) algorithm. As a result, our _nal parallel QR algorithm outperforms ScaLAPACK and Elemental implementations of Householder QR and our implementation of CAQR on the Hopper Cray XE6 NERSC system.