面向异构并行架构的大规模原型学习算法

苏统华; 李松泽; 邓胜春; 于洋; 白薇

期刊检索

关键词检索

新闻公告MORE

【03-25】投稿请提供保密审查证明
【05-04】论文版权转让协议
【07-05】出版伦理声明
【04-04】告作者书
【07-11】审稿人的职责
【11-26】《哈尔滨工业大学学报》入选中国科技期刊卓越行动计划领军期刊
【10-17】《哈工大学报》入选“第5届中国精品科技期刊”
【12-30】《哈工大学报》入选“世界学术影响力Q2期刊”
【01-03】《哈工大学报》入选“2018中国国际影响力优秀学术期刊”
【11-01】哈工大学报荣获2016、2018、2020年度“中国高校百佳科技期刊奖”
【03-24】哈工大学报10篇论文入选中国精品科技期刊顶尖学术论文
【12-05】哈工大学报2024优秀审稿专家
【12-18】哈工大学报2023优秀审稿专家
【12-24】哈工大学报2022优秀审稿专家
【12-21】哈工大学报2021优秀审稿专家
【12-10】哈工大学报2020优秀审稿专家

主管单位 中华人民共和国
工业和信息化部 主办单位 哈尔滨工业大学主编李隆球 国际刊号ISSN 0367-6234 国内刊号CN 23-1235/T

期刊网站二维码

微信公众号二维码

引用本文:	苏统华,李松泽,邓胜春,于洋,白薇.面向异构并行架构的大规模原型学习算法[J].哈尔滨工业大学学报,2016,48(11):53.DOI:10.11918/j.issn.0367-6234.2016.11.009
	SU Tonghua,LI Songze,DENG Shengchun,YU Yang,BAI Wei.Massively scalable prototype learning for heterogeneous parallel computing architecture[J].Journal of Harbin Institute of Technology,2016,48(11):53.DOI:10.11918/j.issn.0367-6234.2016.11.009

【打印本页】【HTML】【下载PDF全文】【查看/发表评论】【下载PDF阅读器】【关闭】

过刊浏览高级检索

本文已被：浏览 1521次下载 1174次	码上扫一扫！
分享到：微信更多字体:加大+\|默认\|缩小-
面向异构并行架构的大规模原型学习算法
苏统华¹, 李松泽¹, 邓胜春¹, 于洋², 白薇³
(1.哈尔滨工业大学软件学院, 哈尔滨 150001;2.中建八局大连公司, 辽宁大连 116021; 3.诺基亚通信系统技术(北京)有限公司浙江分公司, 杭州 310053)

摘要:

为解决当前原型学习算法在大规模、大类别机器学习和模式识别领域的计算密集瓶颈问题, 提出一种采用GPU和CPU异构并行计算架构的可扩展原型学习算法框架.一是通过分解和重组算法的计算任务, 将密集的计算负载转移到GPU上，而CPU只需进行少量的流程控制.二是根据任务类型自适应地决定是采用分块策略还是并行归约策略来实现.采用大规模手写汉字样本库验证本框架, 在消费级显卡GTX680上使用小批量处理模式进行模型学习时，最高可得到194倍的加速比，升级到GTX980显卡，加速比可提升到638倍; 算法甚至在更难以加速的随机梯度下降模式下, 也至少能获得30倍的加速比.该算法框架在保证识别精度的前提下具有很高的可扩展性, 能够有效解决原有原型学习的计算瓶颈问题.

关键词: 原型学习学习矢量量化手写汉字识别并行归约异构并行计算

DOI：10.11918/j.issn.0367-6234.2016.11.009

分类号:TP181

文献标识码:A

基金项目:国家自然科学基金(61203260)；黑龙江省自然科学基金重点项目(ZD2015017)；哈尔滨工业大学科研创新基金 (HIT.NSRIF.2015083)

Massively scalable prototype learning for heterogeneous parallel computing architecture

SU Tonghua¹, LI Songze¹, DENG Shengchun¹, YU Yang², BAI Wei³

(1. School of Software, Harbin Institute of Technology, Harbin 150001, China; 2. Dalian Branch China Construction Eighth Engineering Division Corp. Ltd, Dalian 116021, Liaoning, China; 3.Nokia Solutions and Networks, Hangzhou 310053, China)

Abstract:

Current learning algorithms for prototype learning require intensive computation burden for large category machine learning and pattern recognition fields. To solve this bottleneck problem, a principled scalable prototype learning method is proposed based on heterogeneous parallel computing architecture of GPUs and CPUs. The method can transfer the intense workload to the GPU side instead of CPU side through splitting and rearranging the computing task, so that only a few control process is needed to be managed by the CPU. Meanwhile, the method has the ability to adaptively choose the strategies between tiling and reduction depending on its workload. Our evaluations on a large Chinese character database show that up to 194X speedup can be achieved in the case of mini-batch when evaluated on a consumer-level card of GTX 680. When a new GTX980 card is used, it can scale up to 638X. Even to the more difficult SGD occasion, a more than 30-fold speedup is observed. The proposed framework possess a high scalability while preserving its performance precision, and can effectively solve the bottleneck problems in prototype learning.

Key words: prototype learning learning vector quantization Chinese character recognition parallel reduction heterogeneous parallel computing

期刊检索

关键词检索

新闻公告MORE

友情链接LINKS