欢迎访问《哈尔滨工业大学学报》编辑部网站！

期刊检索

关键词检索

新闻公告MORE

【03-25】投稿请提供保密审查证明
【05-04】论文版权转让协议
【07-05】出版伦理声明
【04-04】告作者书
【07-11】审稿人的职责
【10-17】《哈工大学报》入选“第5届中国精品科技期刊”
【12-30】《哈工大学报》入选“世界学术影响力Q2期刊”
【01-03】《哈工大学报》入选“2018中国国际影响力优秀学术期刊”
【11-01】哈工大学报荣获2016、2018、2020年度“中国高校百佳科技期刊奖”
【03-24】哈工大学报10篇论文入选中国精品科技期刊顶尖学术论文
【12-18】哈工大学报2023优秀审稿专家
【12-24】哈工大学报2022优秀审稿专家
【12-21】哈工大学报2021优秀审稿专家
【12-10】哈工大学报2020优秀审稿专家
【12-13】哈工大学报2019优秀审稿专家
【11-23】哈工大学报2018优秀审稿专家

主管单位 中华人民共和国
工业和信息化部 主办单位 哈尔滨工业大学主编李隆球 国际刊号ISSN 0367-6234 国内刊号CN 23-1235/T

期刊网站二维码

微信公众号二维码

引用本文:	金志刚,何晓勇,岳顺民,熊亚岚,罗嘉.融合知识图谱的医疗领域命名实体识别[J].哈尔滨工业大学学报,2023,55(5):50.DOI:10.11918/202201126
	JIN Zhigang,HE Xiaoyong,YUE Shunmin,XIONG Yalan,LUO Jia.Named entity recognition in medical domain combined with knowledge graph[J].Journal of Harbin Institute of Technology,2023,55(5):50.DOI:10.11918/202201126

【打印本页】【HTML】【下载PDF全文】【查看/发表评论】【下载PDF阅读器】【关闭】

过刊浏览高级检索

本文已被：浏览 5996次下载 6512次	码上扫一扫！
分享到：微信更多字体:加大+\|默认\|缩小-
融合知识图谱的医疗领域命名实体识别
金志刚¹,何晓勇¹,岳顺民^2,3,熊亚岚¹,罗嘉¹
(1. 天津大学电气自动化与信息工程学院,天津 300072; 2.国网天津市电力公司,天津 300010; 3.天津市能源大数据仿真企业重点实验室,天津 300010)

摘要:

为了改善通用预训练模型不适应医疗领域的命名实体识别任务这一不足,提出了一种融合医疗领域知识图谱的神经网络架构,该架构利用弹性位置和遮盖矩阵使预训练模型计算自注意力时避免语义混淆和语义干扰,在微调时使用多任务学习的思想,利用回忆学习的优化算法使预训练模型均衡通用语义表达和目标任务的学习,最终得到更为高效的向量表示并进行标签预测。实验结果表明:本文提出的命名实体识别架构在医疗领域上取得了优于主流预训练模型的效果,在通用领域也有较为良好的效果。该架构避免了重新训练针对某个领域的预训练模型和引入额外的编码结构从而精简了计算代价和模型大小。此外,通过消融实验对比,医疗领域对于知识图谱的依赖程度较通用领域依赖程度更大,这说明在医疗领域中融合知识图谱方法的有效性。通过参数分析,证明本文使用回忆学习的优化算法可以有效控制模型参数的更新,使模型可以保留更多的通用语义信息并得到更符合语义的向量表达。本文也通过实验分析说明了所提方法在实体数量少的种类上具有更优的表现。

关键词: BERT 知识图谱多任务学习命名实体识别

DOI：10.11918/202201126

分类号:TP183

文献标识码:A

基金项目:国家自然科学基金(71502125)

Named entity recognition in medical domain combined with knowledge graph

JIN Zhigang¹,HE Xiaoyong¹,YUE Shunmin^2,3,XIONG Yalan¹,LUO Jia¹

(1.School of Electrical and Information Engineering, Tianjin University, Tianjin 300072, China; 2.State Grid Tianjin Electric Power Company, Tianjin 300010, China; 3. Key Laboratory of Energy Big Data Simulation of Tianjin Enterprise, Tianjin 300010, China)

Abstract:

In view of the problem that general pre-trained models are not suitable for named entity recognition tasks in the medical domain, a neural network architecture that integrates knowledge graph in the medical domain was proposed. The elastic position and masking matrix were used to avoid semantic confusion and semantic interference in self-attention calculation of pre-trained model. The idea of multi-task learning in fine-tuning was adopted, and the optimization algorithm of recall learning was employed for pre-trained model to balance between general semantic expression and learning of the target task. Finally, a more efficient vector representation was obtained and label prediction was conducted. Experimental results showed that the proposed architecture achieved better results than the mainstream pre-trained models in the medical domain, and had relatively good results in the general domain. The architecture avoided retraining pre-trained models in particular domain and additional coding structures, which greatly reduced computational cost and model size. In addition, according to the ablation experiments, the medical domain was more dependent on the knowledge graph than the general domain, indicating the effectiveness of integrating the knowledge graph method in the medical domain. Parameter analysis proved that the optimization algorithm which used recall learning could effectively control the update of model parameters, so that the model retained more general semantic information and obtained more semantic vector representation. Besides, the experimental analysis showed that the proposed method had better performance in the category with a small number of entities.

Key words: bidirectional encoder representation from transformers (BERT) knowledge graph multi-task learning named entity recognition

期刊检索

关键词检索

新闻公告MORE

友情链接LINKS