引用本文: | 高缓钦,陈红全,张加乐,贾雪松.RKDG有限元GPU算法及其重排加速技术[J].哈尔滨工业大学学报,2023,55(8):32.DOI:10.11918/202208043 |
| GAO Huanqin,CHEN Hongquan,ZHANG Jiale,JIA Xuesong.A RKDG GPU parallel algorithm and its acceleration with reordering[J].Journal of Harbin Institute of Technology,2023,55(8):32.DOI:10.11918/202208043 |
|
|
|
本文已被:浏览 621次 下载 1107次 |
码上扫一扫! |
|
RKDG有限元GPU算法及其重排加速技术 |
高缓钦1,2,陈红全1,2,张加乐1,2,贾雪松1,2
|
(1.南京航空航天大学 航空学院,南京 210016; 2.非定常空气动力学与流动控制工信部重点实验室(南京航空航天大学),南京 210016)
|
|
摘要: |
为提升并行化求解Navier Stokes方程的效率,构建了高阶有限元单元及单元边界映射线程结构和对应的各类GPU核函数,成功地把RKDG方法移植到GPU架构,发展出RKDG有限元GPU并行算法。算法数据访存能兼容GPU快慢不一的存储器,尤其在结构网格上,算法涉及的数据依赖区结构有序,能较好满足GPU对齐合并访问的要求。但在非结构网格上,非结构化的数据依赖区,影响到访存效率。基于此提出一种适合高阶有限元算法框架的单元分层重排加速技术,致力于网格的层化结构,提升GPU访存效率。具体基于初始网格拓扑,创建单元或单元边界对应的分层结构,逐层重排,汇总形成适合GPU对齐合并访问的数据存储结构。文中结合排序实例,给出了这一重排加速技术的具体实施过程。算例表明,发展的算法逼近的阶数符合预期,计算结果能与现有文献或实验结果接近,且最大GPU加速比可达67.47。此外,非结构网格算例证实,算法可处理较为复杂的几何边界,且所提重排技术可进一步赢得重排加速。 |
关键词: RKDG方法 GPU 分层排序 非结构网格 Navier Stokes方程 |
DOI:10.11918/202208043 |
分类号:V211.3 |
文献标识码:A |
基金项目:国家自然科学基金(9,8) |
|
A RKDG GPU parallel algorithm and its acceleration with reordering |
GAO Huanqin1,2,CHEN Hongquan1,2,ZHANG Jiale1,2,JIA Xuesong1,2
|
(1.College of Aerospace Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China; 2.Key Laboratory of Unsteady Aerodynamics and Flow Control (Nanjing University of Aeronautics and Astronautics), Ministry of Industry and Information Technology, Nanjing 210016, China)
|
Abstract: |
To enhance the parallel efficiency of solving Navier Stokes equations, a graphic processing unit (GPU) parallel algorithm, ported from Runge-Kutta discontinuous Galerkin (RKDG) method, is presented through constructing element-based or edge-based thread hierarchy and corresponding GPU kernels. The data storage and access of the algorithm are designed to be compatible for the various types of memories with different latencies. In comparison with the structured mesh counterpart, in which the structured domain of data dependence is already quite good for the requirement of coalesced memory access, the irregularity of unstructured mesh shows a negative effect on the performance of memory access. To remedy the negative effect, a multi-layered element reordering approach suitable for high-order finite element method is proposed to achieve further acceleration. Starting with the initial mesh, layer structures of elements or edges are constructed with reordering in a layer-by-layer manner to form the data structures suitable for coalesced memory access. An example of mesh reordering is provided with the implementation process detailed. Numerical results of typical flow simulations reveal that the expected order of accuracy of the proposed algorithm is realized, and the calculated results agree well with experiment data or other computed resules in the existing literature, with the maximum GPU speedups achieved up to 67.47. Moreover, the algorithm exhibits the potential to cope with more complex geometries, and the proposed technique can further achieve reordering acceleration. |
Key words: RKDG method GPU multi-layered reordering unstructured mesh Navier Stokes equations |
|
|
|
|