欢迎访问《哈尔滨工业大学学报》编辑部网站！

期刊检索

关键词检索

新闻公告MORE

【03-25】投稿请提供保密审查证明
【05-04】论文版权转让协议
【07-05】出版伦理声明
【04-04】告作者书
【07-11】审稿人的职责
【10-17】《哈工大学报》入选“第5届中国精品科技期刊”
【12-30】《哈工大学报》入选“世界学术影响力Q2期刊”
【01-03】《哈工大学报》入选“2018中国国际影响力优秀学术期刊”
【11-01】哈工大学报荣获2016、2018、2020年度“中国高校百佳科技期刊奖”
【03-24】哈工大学报10篇论文入选中国精品科技期刊顶尖学术论文
【12-18】哈工大学报2023优秀审稿专家
【12-24】哈工大学报2022优秀审稿专家
【12-21】哈工大学报2021优秀审稿专家
【12-10】哈工大学报2020优秀审稿专家
【12-13】哈工大学报2019优秀审稿专家
【11-23】哈工大学报2018优秀审稿专家

主管单位 中华人民共和国
工业和信息化部 主办单位 哈尔滨工业大学主编李隆球 国际刊号ISSN 0367-6234 国内刊号CN 23-1235/T

期刊网站二维码

微信公众号二维码

引用本文:	刘晓芳,刘策,刘露咪,程丹松.重叠局部高斯过程回归[J].哈尔滨工业大学学报,2019,51(11):22.DOI:10.11918/j.issn.0367-6234.201904056
	LIU Xiaofang,LIU Ce,LIU Lumi,CHENG Dansong.Overlapped Local Gaussian Process Regression[J].Journal of Harbin Institute of Technology,2019,51(11):22.DOI:10.11918/j.issn.0367-6234.201904056

【打印本页】【HTML】【下载PDF全文】【查看/发表评论】【下载PDF阅读器】【关闭】

过刊浏览高级检索

本文已被：浏览 1530次下载 1565次	码上扫一扫！
分享到：微信更多字体:加大+\|默认\|缩小-
重叠局部高斯过程回归
刘晓芳²,刘策¹,刘露咪³,程丹松¹
(1.哈尔滨工业大学计算机科学与技术学院,哈尔滨 150001; 2.哈尔滨工业大学电气工程及自动化学院, 哈尔滨 150001; 3.北京控制工程研究所,北京 100000)

摘要:

高斯过程是一种函数的分布,在机器学习领域常用于回归.对于n个训练样本,其训练和预测时间复杂度分别为O(n³)和O(n²),因此难以应用于大规模数据.针对这个问题,本文基于分治的思想,提出一种简单高效的近似模型,称为“重叠局部高斯过程”.本文方法假设随机变量在给定邻近变量的值后,会与距离较远的变量条件独立.首先将训练样本集递归划分,构建一棵三叉树,其中兄弟节点包含的样本存在交集,交集中的样本起到诱导点的作用,可构建相邻区域的依赖关系.然后利用每个叶子结点所包含的样本建立局部的高斯过程回归模型,在当前假设下,每个父节点的边缘似然和预测分布可通过组合其子节点的计算结果来近似,从而降低计算量.同时,这种组合方式可保证拟合的函数是连续的.理论分析表明,对于n个训练样本,近似模型训练和预测的时间复杂度均为O(n^t),其中t与交集的大小相关,通常介于1与2之间.此外通过在公共数据集上的实验对比也验证了本文近似模型的有效性.

关键词: 模式识别高斯过程回归贝叶斯学习大规模机器学习

DOI：10.11918/j.issn.0367-6234.201904056

分类号:TP391.4

文献标识码:A

基金项目:国家自然科学基金(51677042)

Overlapped Local Gaussian Process Regression

LIU Xiaofang²,LIU Ce¹,LIU Lumi³,CHENG Dansong¹

(1.School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China; 2.School of Electrical Engineering & Automation, Harbin Institute of Technology, Harbin 150001, China; 3.Beijing Institute of Control Engineering, Beijing 100000, China)

Abstract:

Gaussian processes (GPs) are distributions of functions, and are commonly used for regression in machine learning community. For n training samples, their time complexity for training and prediction is O(n³) and O(n²) respectively. The high computational cost hinders their application to large datasets. Inspired by the divide and conquer strategy, the paper proposed a simple but efficient approximate model, called Overlapped Local Gaussian Process (OLGP). The method assumes that given the nearest neighbors, the random variables are independent of farther ones. The training samples are recursively divided and a ternary tree is constructed in the end. Sibling nodes have intersections where the samples play the role of inducing points to model the dependency between neighboring regions. Each leaf node is associated with a local GP regression model. The evidence and predictive distribution in parent nodes are composed of the ones from their sons, which reduces the computational cost significantly. Furthermore, the fitted function is guaranteed to be continuous. A theoretical analysis shows that the time complexity for training and prediction reduces to O(n^t) for n training samples, where t depends on the proportion of the intersection in each level, and is usually between 1 and 2. The paper demonstrated the effectiveness of the method by speed-accuracy performance on several real-world datasets.

Key words: pattern recognition gaussian process regression bayesian learning large scale machine learning

期刊检索

关键词检索

新闻公告MORE

友情链接LINKS