欢迎访问《哈尔滨工业大学学报》编辑部网站！

期刊检索

关键词检索

新闻公告MORE

【03-25】投稿请提供保密审查证明
【05-04】论文版权转让协议
【07-05】出版伦理声明
【04-04】告作者书
【07-11】审稿人的职责
【10-17】《哈工大学报》入选“第5届中国精品科技期刊”
【12-30】《哈工大学报》入选“世界学术影响力Q2期刊”
【01-03】《哈工大学报》入选“2018中国国际影响力优秀学术期刊”
【11-01】哈工大学报荣获2016、2018、2020年度“中国高校百佳科技期刊奖”
【03-24】哈工大学报10篇论文入选中国精品科技期刊顶尖学术论文
【12-18】哈工大学报2023优秀审稿专家
【12-24】哈工大学报2022优秀审稿专家
【12-21】哈工大学报2021优秀审稿专家
【12-10】哈工大学报2020优秀审稿专家
【12-13】哈工大学报2019优秀审稿专家
【11-23】哈工大学报2018优秀审稿专家

主管单位 中华人民共和国
工业和信息化部 主办单位 哈尔滨工业大学主编李隆球 国际刊号ISSN 0367-6234 国内刊号CN 23-1235/T

期刊网站二维码

微信公众号二维码

引用本文:	高缓钦,陈红全,张加乐,贾雪松.RKDG有限元GPU算法及其重排加速技术[J].哈尔滨工业大学学报,2023,55(8):32.DOI:10.11918/202208043
	GAO Huanqin,CHEN Hongquan,ZHANG Jiale,JIA Xuesong.A RKDG GPU parallel algorithm and its acceleration with reordering[J].Journal of Harbin Institute of Technology,2023,55(8):32.DOI:10.11918/202208043

【打印本页】【HTML】【下载PDF全文】【查看/发表评论】【下载PDF阅读器】【关闭】

过刊浏览高级检索

本文已被：浏览 621次下载 1107次	码上扫一扫！
分享到：微信更多字体:加大+\|默认\|缩小-
RKDG有限元GPU算法及其重排加速技术
高缓钦^1,2,陈红全^1,2,张加乐^1,2,贾雪松^1,2
(1.南京航空航天大学航空学院,南京 210016; 2.非定常空气动力学与流动控制工信部重点实验室(南京航空航天大学),南京 210016)

摘要:

为提升并行化求解Navier Stokes方程的效率,构建了高阶有限元单元及单元边界映射线程结构和对应的各类GPU核函数,成功地把RKDG方法移植到GPU架构,发展出RKDG有限元GPU并行算法。算法数据访存能兼容GPU快慢不一的存储器,尤其在结构网格上,算法涉及的数据依赖区结构有序,能较好满足GPU对齐合并访问的要求。但在非结构网格上,非结构化的数据依赖区,影响到访存效率。基于此提出一种适合高阶有限元算法框架的单元分层重排加速技术,致力于网格的层化结构,提升GPU访存效率。具体基于初始网格拓扑,创建单元或单元边界对应的分层结构,逐层重排,汇总形成适合GPU对齐合并访问的数据存储结构。文中结合排序实例,给出了这一重排加速技术的具体实施过程。算例表明,发展的算法逼近的阶数符合预期,计算结果能与现有文献或实验结果接近,且最大GPU加速比可达67.47。此外,非结构网格算例证实,算法可处理较为复杂的几何边界,且所提重排技术可进一步赢得重排加速。

关键词: RKDG方法 GPU 分层排序非结构网格 Navier Stokes方程

DOI：10.11918/202208043

分类号:V211.3

文献标识码:A

基金项目:国家自然科学基金(9,8)

A RKDG GPU parallel algorithm and its acceleration with reordering

GAO Huanqin^1,2,CHEN Hongquan^1,2,ZHANG Jiale^1,2,JIA Xuesong^1,2

(1.College of Aerospace Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China; 2.Key Laboratory of Unsteady Aerodynamics and Flow Control (Nanjing University of Aeronautics and Astronautics), Ministry of Industry and Information Technology, Nanjing 210016, China)

Abstract:

To enhance the parallel efficiency of solving Navier Stokes equations, a graphic processing unit (GPU) parallel algorithm, ported from Runge-Kutta discontinuous Galerkin (RKDG) method, is presented through constructing element-based or edge-based thread hierarchy and corresponding GPU kernels. The data storage and access of the algorithm are designed to be compatible for the various types of memories with different latencies. In comparison with the structured mesh counterpart, in which the structured domain of data dependence is already quite good for the requirement of coalesced memory access, the irregularity of unstructured mesh shows a negative effect on the performance of memory access. To remedy the negative effect, a multi-layered element reordering approach suitable for high-order finite element method is proposed to achieve further acceleration. Starting with the initial mesh, layer structures of elements or edges are constructed with reordering in a layer-by-layer manner to form the data structures suitable for coalesced memory access. An example of mesh reordering is provided with the implementation process detailed. Numerical results of typical flow simulations reveal that the expected order of accuracy of the proposed algorithm is realized, and the calculated results agree well with experiment data or other computed resules in the existing literature, with the maximum GPU speedups achieved up to 67.47. Moreover, the algorithm exhibits the potential to cope with more complex geometries, and the proposed technique can further achieve reordering acceleration.

Key words: RKDG method GPU multi-layered reordering unstructured mesh Navier Stokes equations

期刊检索

关键词检索

新闻公告MORE

友情链接LINKS