Q-learning intelligent jamming decision algorithm based on efficient upper confidence bound variance

doi:10.11918/202010082

Home > Archive>Volume 54, Issue 5, 2022 >162-170. DOI:10.11918/202010082

Q-learning intelligent jamming decision algorithm based on efficient upper confidence bound variance
DOI:
                        10.11918/202010082
                    
CSTR:
                        
Author:
                        
Affiliation:(Information and Navigation College, Air Force Engineering University, Xi’an 710077, China)
Clc Number:TN975
Fund Project:

Article

Figures

Metrics

Reference

Cited by

Materials

Comments

Abstract:

To further improve the convergence speed of the intelligent jamming decision-making algorithm based on value function in reinforcement learning and enhance its effectiveness, an improved Q-learning intelligent communication jamming decision algorithm was designed integrating the efficient upper confidence bound variance. Based on the framework of Q-learning algorithm, the proposed algorithm utilizes the value variance of effective jamming action to set the confidence interval. It can eliminate the jamming action with low confidence from the jamming action space, reduce the unnecessary exploration cost in the unknown environment, speed up its searching speed in the interference action space, and synchronously update the value of all actions, thus accelerating the optimal strategy learning process. The jamming decision-making scenario was modeled as the Markov decision process for simulation. Results show that when the correspondent used interference avoidance strategy against the jammer to change the communication channel, the proposed algorithm could achieve faster convergence speed, higher jamming success rate, and greater total jamming rewards, under the condition of no prior information, compared with the existing decision-making algorithms based on reinforcement learning. Besides, the algorithm could be applied to the “many-to-many” cooperative countermeasure environment. The action elimination method was used to reduce the dimension of joint jamming action, and the jamming success rate of the proposed algorithm was 50% higher than those of the traditional Q-learning decision algorithms under the same conditions.

Reference

Cited by

Get Citation

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

History

Received:October 26,2020
Revised:
Adopted:
Online: April 25,2022
Published:

Publication Statement

Journal Subscription

Get Citation

Related Videos

Share

Article Metrics

History

Article QR Code