期刊检索

  • 2024年第56卷
  • 2023年第55卷
  • 2022年第54卷
  • 2021年第53卷
  • 2020年第52卷
  • 2019年第51卷
  • 2018年第50卷
  • 2017年第49卷
  • 2016年第48卷
  • 2015年第47卷
  • 2014年第46卷
  • 2013年第45卷
  • 2012年第44卷
  • 2011年第43卷
  • 2010年第42卷
  • 第1期
  • 第2期

主管单位 中华人民共和国
工业和信息化部
主办单位 哈尔滨工业大学 主编 李隆球 国际刊号ISSN 0367-6234 国内刊号CN 23-1235/T

期刊网站二维码
微信公众号二维码
引用本文:金志刚,何晓勇,岳顺民,熊亚岚,罗嘉.融合知识图谱的医疗领域命名实体识别[J].哈尔滨工业大学学报,2023,55(5):50.DOI:10.11918/202201126
JIN Zhigang,HE Xiaoyong,YUE Shunmin,XIONG Yalan,LUO Jia.Named entity recognition in medical domain combined with knowledge graph[J].Journal of Harbin Institute of Technology,2023,55(5):50.DOI:10.11918/202201126
【打印本页】   【HTML】   【下载PDF全文】   查看/发表评论  下载PDF阅读器  关闭
过刊浏览    高级检索
本文已被:浏览 5996次   下载 6512 本文二维码信息
码上扫一扫!
分享到: 微信 更多
融合知识图谱的医疗领域命名实体识别
金志刚1,何晓勇1,岳顺民2,3,熊亚岚1,罗嘉1
(1. 天津大学 电气自动化与信息工程学院,天津 300072; 2.国网天津市电力公司,天津 300010; 3.天津市能源大数据仿真企业重点实验室,天津 300010)
摘要:
为了改善通用预训练模型不适应医疗领域的命名实体识别任务这一不足,提出了一种融合医疗领域知识图谱的神经网络架构,该架构利用弹性位置和遮盖矩阵使预训练模型计算自注意力时避免语义混淆和语义干扰,在微调时使用多任务学习的思想,利用回忆学习的优化算法使预训练模型均衡通用语义表达和目标任务的学习,最终得到更为高效的向量表示并进行标签预测。实验结果表明:本文提出的命名实体识别架构在医疗领域上取得了优于主流预训练模型的效果,在通用领域也有较为良好的效果。该架构避免了重新训练针对某个领域的预训练模型和引入额外的编码结构从而精简了计算代价和模型大小。此外,通过消融实验对比,医疗领域对于知识图谱的依赖程度较通用领域依赖程度更大,这说明在医疗领域中融合知识图谱方法的有效性。通过参数分析,证明本文使用回忆学习的优化算法可以有效控制模型参数的更新,使模型可以保留更多的通用语义信息并得到更符合语义的向量表达。本文也通过实验分析说明了所提方法在实体数量少的种类上具有更优的表现。
关键词:  BERT  知识图谱  多任务学习  命名实体识别
DOI:10.11918/202201126
分类号:TP183
文献标识码:A
基金项目:国家自然科学基金(71502125)
Named entity recognition in medical domain combined with knowledge graph
JIN Zhigang1,HE Xiaoyong1,YUE Shunmin2,3,XIONG Yalan1,LUO Jia1
(1.School of Electrical and Information Engineering, Tianjin University, Tianjin 300072, China; 2.State Grid Tianjin Electric Power Company, Tianjin 300010, China; 3. Key Laboratory of Energy Big Data Simulation of Tianjin Enterprise, Tianjin 300010, China)
Abstract:
In view of the problem that general pre-trained models are not suitable for named entity recognition tasks in the medical domain, a neural network architecture that integrates knowledge graph in the medical domain was proposed. The elastic position and masking matrix were used to avoid semantic confusion and semantic interference in self-attention calculation of pre-trained model. The idea of multi-task learning in fine-tuning was adopted, and the optimization algorithm of recall learning was employed for pre-trained model to balance between general semantic expression and learning of the target task. Finally, a more efficient vector representation was obtained and label prediction was conducted. Experimental results showed that the proposed architecture achieved better results than the mainstream pre-trained models in the medical domain, and had relatively good results in the general domain. The architecture avoided retraining pre-trained models in particular domain and additional coding structures, which greatly reduced computational cost and model size. In addition, according to the ablation experiments, the medical domain was more dependent on the knowledge graph than the general domain, indicating the effectiveness of integrating the knowledge graph method in the medical domain. Parameter analysis proved that the optimization algorithm which used recall learning could effectively control the update of model parameters, so that the model retained more general semantic information and obtained more semantic vector representation. Besides, the experimental analysis showed that the proposed method had better performance in the category with a small number of entities.
Key words:  bidirectional encoder representation from transformers (BERT)  knowledge graph  multi-task learning  named entity recognition

友情链接LINKS