Journal of Harbin Institute of Technology(New Series)

Please submit manuscripts in either of the following two submission systems

ScholarOne Manuscripts

ScholarOne

勤云稿件系统

Search by Issue

Search by Keywords

News & AnnouncementMORE

【03-29】2015 Outstanding Reviewers
【03-27】2014 Outstanding Reviewers
【02-18】2013 Outstanding Reviewers
【12-29】The First Outstanding Reviewers
【05-04】Copyright Transfer Agreement
【04-04】To authors

Supervised by Ministry of Industry and Information Technology of The People's Republic of China Sponsored by Harbin Institute of Technology Editor-in-chief Yu Zhou ISSNISSN 1005-9113 CNCN 23-1378/T

期刊网站二维码

微信公众号二维码

Related citation:

Mengda Xu,Luqun Li.Performance Analysis of Cross-Site Scripting Based on Natural Language Processing[J].Journal of Harbin Institute Of Technology(New Series),2022,29(4):19-25.DOI:10.11916/j.issn.1005-9113.2020083.

【Print】【HTML】【PDF download】【View/Add Comment】【Download reader】【 Close 】

←Previous|Next→

Back Issue Advanced Search

This paper has been: browsed 547times downloaded 270times	码上扫一扫！
Shared by: Wechat More Font:larger+\|default\|smaller-
Performance Analysis of Cross-Site Scripting Based on Natural Language Processing

Author Name	Affiliation
Mengda Xu	The College of Information, Mechanical and Electrical Engineering, Shanghai Normal University, Shanghai 200234, China
Luqun Li	The College of Information, Mechanical and Electrical Engineering, Shanghai Normal University, Shanghai 200234, China

Abstract:

With the acceleration of network communication in the 5G era, the volume of data communication in cyberspace has increased unprecedentedly. The speed of data transmission will accelerate. Subsequently, the security of network communication data becomes more and more serious. Among them, malicious cross-site scripting leading to the leakage of user information is very serious. This article uses URL attribute analysis method and YARA rule to process data for cross-site scripting based on the long short-term memory (LSTM) characteristics of LSTM model. The results show that the LSTM classification model adopted in this paper has higher recall rate and F1-score than other machine learning methods, which proves that the method adopted in this paper is feasible.

Key words: cross-site scripting network communication web security natural language processing

DOI：10.11916/j.issn.1005-9113.2020083

Clc Number:TP393

Fund:

Descriptions in Chinese:

基于自然语言处理的跨站脚本性能分析

徐孟达，李鲁群

（上海师范大学信息与机电工程学院，上海 200234）

创新点说明：

恶意跨站脚本导致的用户信息泄露是非常严重的。本文使用URL属性分析和YARA规则来处理跨站点脚本编写的数据。

研究目的：

利用机器学习及深度学习等方法，采集恶意攻击脚本数据并分析跨站脚本语句，实现恶意跨站脚本分类，做到预防 XSS 攻击，对于网络攻击防御有一定的现实意义。

研究方法：

1、从自然语言处理的角度对 XSS 脚本进行分析、建模，用深度学习方法研究 XSS 识别规则和分类模式，并验证分类效果

2、提出基于 URL 属性的分析和基于 YARA 规则的分析方法，根据收集到的数据的文本特征，使用定义的数据预处理流程对恶意跨站脚本数据进行预处理。通过恶意脚本代码注入检测的策略和绕过检测的策略，分析得出其文本特征。

研究结果：

1、LSTM模型在100轮左右的实验中，准确率达到98%，召回率达到96%。在相同条件下，它比MLP模型和CNN模型的速度快约2倍。

2、机器学习的决策树方法在模型训练上比神经网络模型训练花费的时间少。但神经网络模型的准确率普遍高于机器学习决策树算法。

3、机器学习的决策树方法在模型训练上比神经网络模型训练花费的时间少。但决策树方法不能反映损失值。

4、本文的LSTM方法准确率较低，但loss值较低，recall rate和F1得分最高。此外，召回率和F1评分可以反映模型识别恶意脚本的能力，可以证明本文采用的方法具有良好的效果。

5决策树的机器学习方法相结合的研究,表明决策树处理大样本的能力较弱,和树结构形成时,数据量太大不适合数据集的计算。因此，深度学习可以训练大数据样本，获得更准确的结果。

结论：

本文通过属性分析和YARA规则分类来分析跨站点脚本。本文从自然语言处理的角度分析跨站点脚本。此外，将单词编码集的计算添加到LSTM神经网络模型中，对实验数据进行分类，取得了良好效果，有利于一定程度上提高网络数据通信安全。

文中编写的YARA规则并不全面。针对收集到的数据，有必要在后续的研究中对恶意脚本语句的攻击形式进行总结和细化，不断优化语句处理逻辑，进而提高脚本检测能力。

关键词：跨站点脚本；网络通信；网络安全；自然语言处理

Search by Issue

Search by Keywords

News & AnnouncementMORE

LINKS