期刊文献+

一种融合ACNN和Bi-LSTM半监督缩略语消歧方法

A Semi-Supervised Abbreviation Disambiguation Method Based on ACNN and Bi-LSTM
下载PDF
导出
摘要 为了提高生物医学缩略语的消歧准确率,提出了一种融合ACNN和Bi-LSTM半监督缩略语消歧方法。以缩略语为中心,提取左右4个邻接词汇单元的词形信息、词性信息和语义信息作为消歧特征。使用Xgboost算法和LightGBM算法扩充训练语料,将扩充完的训练语料输入到这个模型中,使用非对称卷积神经网络(asymmetric convolutional neural networks,ACNN)和双向长短期记忆网络(bidirectional long short-term memory,Bi-LSTM)来提取特征,使用softmax函数进行语义分类。使用MSH语料来优化该模型并测试其消歧性能,实验结果表明:本文所提出模型只需使用少量的有标注语料,可以有效的提高缩略语消歧准确率。 In order to improve disambiguation accuracy of biomedical abbreviations,a semi-supervised abbreviation disambiguation method based on asymmetric convolutional neural networks and bidirectional long short term memory networks is proposed.Abbreviation is viewed as center.Morphology information,part of speech and semantic information from four adjacent lexical units are extracted as disambiguation features.Training corpus is extended by using Xgboost algorithm and LightGBM algorithm,and then expanded training corpus is input into this model.Asymmetric convolutional neural networks(ACNN)and bidirectional long short-term memory(Bi-LSTM)networks are utilized to extract features.Softmax function is applied to semantic classification.MSH corpus is adopted to optimize this model and test its disambiguation performance.Experimental results show that the proposed model can effectively improve disambiguation accuracy of abbreviations by using only a small amount of annotated corpus.
作者 张春祥 逄淑阳 高雪瑶 ZHANG Chun-xiang;PANG Shu-yang;GAO Xue-yao(School of Computer Science and Technology,Harbin University of Science and Technology,Harbin 150080,China)
出处 《哈尔滨理工大学学报》 CAS 北大核心 2022年第5期30-37,共8页 Journal of Harbin University of Science and Technology
基金 国家自然科学基金(61502124,60903082) 中国博士后科学基金(2014M560249) 黑龙江省自然科学基金(LH2022F031).
关键词 缩略语 Xgboost LightGBM 消歧特征 非对称卷积神经网络 双向长短期记忆网络 abbreviation Xgboost LightGBM disambiguation feature asymmetric convolution neural network bidirectional long short-term memory
  • 相关文献

参考文献2

二级参考文献26

  • 1MITCHELL TM.机器学习[M].曾华军,张银奎.北京:机械工业出版社,2003.
  • 2Collier N., Nobata C., and Tsujii J.. Extracting the Names of Genes and Gene Products with a Hidden Markov Model [A]. Proc. of the 18^th International Conference on Computational Linguistics[C]. Saarbrucken, Germany: 2000.
  • 3Fukuda, et al.. Toward Information Extraction:Identifying Protein Names from Biomedical Papers[A]. Proc. of the Pacific Symposium on Biocomputing 98 [C]. Hawaii: 1998.
  • 4Liu H., Johnson S. B., and Friedman C.. Automatic Resolution of Ambiguous Terms Based on Machine Learning and Conceptual Relations in the UMLS[J]. Journal of the American Medical Informaties Association, 9(6): 621- 636.
  • 5Chang J., Schutze H., and Altman R.. Creating an Online Dictionary of Abbreviations from MEDLINE[J]. Journal of the American Medical Informatics As- sociation, 9(6):612-620.
  • 6Schwartz A. and Hearst M.. A Simple Algorithm for Identifying Abbreviation Definitions in Biomedical Text [A]. Proc. of the Pacific Symposium on Biocomputing 2003 [C]. 2003.
  • 7Pakhomov S.. Semi-Supervised Maximum Entropy Based Approach to Acronym and Abbreviation Normalization in Medical Texts [A]. Proc. of the 40^th Annual Meeting of the Association for Computational Linguistics (ACL)[C]. 160-167.
  • 8Yu Z. Tsuruoka Y., and Tsujii J.. Automatic Resolution of Ambiguous Abbreviations in Biomedical Texts using Support Vector Machines and One Sense Per Discourse Hypothesis[A]. Proc. of ACM SIGIR'03 Workshop on Text Analysis and Search for Bioinformatics[C]. 57-62.
  • 9Yu H., et al.. Mapping Abbreviations to Full Forms in Biomedical Articles[J]. Journal of American Medical Information Association, 2002,9: 262-272.
  • 10Castano J., Zhang J., and Pustejovsky J.. Anaphora Resolution in Biomedical Literature [A]. International Symposium on Reference Resolution[C]. 2002.

共引文献50

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部