欢迎访问《哈尔滨工业大学学报》编辑部网站！

期刊检索

关键词检索

新闻公告MORE

【03-25】投稿请提供保密审查证明
【05-04】论文版权转让协议
【07-05】出版伦理声明
【04-04】告作者书
【07-11】审稿人的职责
【10-17】《哈工大学报》入选“第5届中国精品科技期刊”
【12-30】《哈工大学报》入选“世界学术影响力Q2期刊”
【01-03】《哈工大学报》入选“2018中国国际影响力优秀学术期刊”
【11-01】哈工大学报荣获2016、2018、2020年度“中国高校百佳科技期刊奖”
【03-24】哈工大学报10篇论文入选中国精品科技期刊顶尖学术论文
【12-18】哈工大学报2023优秀审稿专家
【12-24】哈工大学报2022优秀审稿专家
【12-21】哈工大学报2021优秀审稿专家
【12-10】哈工大学报2020优秀审稿专家
【12-13】哈工大学报2019优秀审稿专家
【11-23】哈工大学报2018优秀审稿专家

主管单位 中华人民共和国
工业和信息化部 主办单位 哈尔滨工业大学主编李隆球 国际刊号ISSN 0367-6234 国内刊号CN 23-1235/T

期刊网站二维码

微信公众号二维码

引用本文:	饶宁,许华,宋佰霖.融合有效方差置信上界的Q学习智能干扰决策算法[J].哈尔滨工业大学学报,2022,54(5):162.DOI:10.11918/202010082
	RAO Ning,XU Hua,SONG Bailin.Q-learning intelligent jamming decision algorithm based on efficient upper confidence bound variance[J].Journal of Harbin Institute of Technology,2022,54(5):162.DOI:10.11918/202010082

【打印本页】【HTML】【下载PDF全文】【查看/发表评论】【下载PDF阅读器】【关闭】

过刊浏览高级检索

本文已被：浏览 791次下载 1084次	码上扫一扫！
分享到：微信更多字体:加大+\|默认\|缩小-
融合有效方差置信上界的Q学习智能干扰决策算法
饶宁,许华,宋佰霖
(空军工程大学信息与导航学院,西安 710077)

摘要:

为进一步提升基于值函数强化学习的智能干扰决策算法的收敛速度,增强战场决策的有效性,设计了一种融合有效方差置信上界思想的改进Q学习智能通信干扰决策算法。该算法在Q学习算法的框架基础上,利用有效干扰动作的价值方差设置置信区间,从干扰动作空间中剔除置信度较低的干扰动作,减少干扰方在未知环境中不必要的探索成本,加快其在干扰动作空间的搜索速度,并同步更新所有干扰动作的价值,进而加速学习最优干扰策略。通过将干扰决策场景建模为马尔科夫决策过程进行仿真实验,所构造的干扰实验结果表明:当通信方使用干扰方未知的干扰躲避策略变更通信波道时,与现有基于强化学习的干扰决策算法相比,该算法在无通信方的先验信息条件下,收敛速度更快,可达到更高的干扰成功率,获得更大的干扰总收益。此外,该算法还适用于“多对多”协同对抗环境,可利用动作剔除方法降低联合干扰动作的空间维度,相同实验条件下,其干扰成功率比传统Q学习决策算法高50%以上。

关键词: 干扰决策强化学习有效方差置信上界 Q学习干扰动作剔除马尔科夫决策过程

DOI：10.11918/202010082

分类号:TN975

文献标识码:A

基金项目:

Q-learning intelligent jamming decision algorithm based on efficient upper confidence bound variance

RAO Ning,XU Hua,SONG Bailin

(Information and Navigation College, Air Force Engineering University, Xi’an 710077, China)

Abstract:

To further improve the convergence speed of the intelligent jamming decision-making algorithm based on value function in reinforcement learning and enhance its effectiveness, an improved Q-learning intelligent communication jamming decision algorithm was designed integrating the efficient upper confidence bound variance. Based on the framework of Q-learning algorithm, the proposed algorithm utilizes the value variance of effective jamming action to set the confidence interval. It can eliminate the jamming action with low confidence from the jamming action space, reduce the unnecessary exploration cost in the unknown environment, speed up its searching speed in the interference action space, and synchronously update the value of all actions, thus accelerating the optimal strategy learning process. The jamming decision-making scenario was modeled as the Markov decision process for simulation. Results show that when the correspondent used interference avoidance strategy against the jammer to change the communication channel, the proposed algorithm could achieve faster convergence speed, higher jamming success rate, and greater total jamming rewards, under the condition of no prior information, compared with the existing decision-making algorithms based on reinforcement learning. Besides, the algorithm could be applied to the “many-to-many” cooperative countermeasure environment. The action elimination method was used to reduce the dimension of joint jamming action, and the jamming success rate of the proposed algorithm was 50% higher than those of the traditional Q-learning decision algorithms under the same conditions.

Key words: jamming decision-making reinforcement learning efficient upper confidence bound variance Q-learning jamming action elimination Markov decision process

期刊检索

关键词检索

新闻公告MORE

友情链接LINKS