PI: Nichols A. Romero, LCF

Objective: Our metascalable (or “design once, scale on new architectures”) simulation approach achieves portable performance on current and future computing platforms based on a novel divide-conquer-“recombine” (DCR) algorithmic framework for (1) lean divide-and-conquer density functional theory (LDC-DFT) for QMD simulations with minimal O(N) prefactor, and (2) extended Lagrangian RMD (XRMD) to eliminate the speed-limiting charge iterations in RMD simulations. Key to metascalability is global-local separation achieved by our globally scalable/reproducible and locally fast (GSLF) solvers based on (1) a new scalable and reproducible global summation method, and (2) fast shift-collapse (SC) computation of local n-tuples.

Our codes are scalable beyond petaflop/s. Our 39.8 trillion electronic degrees-of-freedom QMD and 68 billion-atom RMD benchmarks have achieved parallel efficiency exceeding 0.98 and 51% of the theoretical floating-point performance on 786,432 Blue Gene/Q cores. Performance-portability of our simulation algorithms has been verified on general-purpose graphics processing units (GPGPUs) and early Intel Xeon Phi.

Testbed description: We will test the performance portability of our quantum molecular dynamics (QMD) and reactive molecular dynamics (RMD) simulation codes to the Intel Xeon Phi Knights Landing (KNL) and other advanced architectures.