This paper describes performance tuning experiences with a parallel CFD code to enhance its performance and flexibility on large scale parallel computers. The code solves the incompressible Navier-Stokes equations based on the novel Slightly Compressible Model on three-dimensional structure grids. High level loop transformations and argument based code specialization are utilized to optimize its uniprocessor performance. Static arrays are converted into dynamically allocated arrays to improve the flexibility. The grid generator is coupled with the flow solver so that they can exchange grid data in the memory. A detailed performance evaluation is performed. The results show that our uniprocessor optimizations improve the performance of the flow solver for 1.38 times to 3.93 times on Tianhe-1A supercomputer. In memory grid data exchange optimization speeds up the application startup time by nearly two magnitudes. The optimized code exhibits an excellent parallel scalability running realistic test cases. On 4 096 CPU cores, it achieves a strong scaling parallel efficiency of 77.39 % and a maximum performance of 4.01 Tflops.
Publié le : 2015-02-11
Classification:  Parallel and Distributed Computing,  Computational fluid dynamics, slightly compressible model, large-scale parallel computing, uniprocessor optimizations, in memory grid exchange, scalability, efficiency,  65Y05
@article{cai1393,
     author = {Yonggang Che; Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, Changsha 410073, Hunan and Lilun Zhang; Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, Changsha 410073, Hunan and Chuanfu Xu; Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, Changsha 410073, Hunan and Yongxian Wang; Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, Changsha 410073, Hunan and Wei Liu; Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, Changsha 410073, Hunan and Zhenghua Wang; Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, Changsha 410073, Hunan},
     title = {Optimization of a Parallel CFD Code and Its Performance Evaluation on Tianhe-1A},
     journal = {Computing and Informatics},
     volume = {33},
     number = {3},
     year = {2015},
     language = {en},
     url = {http://dml.mathdoc.fr/item/cai1393}
}
Yonggang Che; Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, Changsha 410073, Hunan; Lilun Zhang; Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, Changsha 410073, Hunan; Chuanfu Xu; Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, Changsha 410073, Hunan; Yongxian Wang; Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, Changsha 410073, Hunan; Wei Liu; Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, Changsha 410073, Hunan; Zhenghua Wang; Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, Changsha 410073, Hunan. Optimization of a Parallel CFD Code and Its Performance Evaluation on Tianhe-1A. Computing and Informatics, Tome 33 (2015) no. 3, . http://gdmltest.u-ga.fr/item/cai1393/