期刊文献+

基于机器阅读理解的非遗文本实体抽取研究 被引量:1

Extracting Entities from Intangible Cultural Heritage Texts Based on Machine Reading Comprehension
原文传递
导出
摘要 【目的】针对当前非遗文本实体抽取研究的不足,提出以机器阅读理解方法为基础,通过问答的方式对非遗文本中的实体进行抽取。【方法】构建非遗实体敏感的注意力机制,用于捕捉非遗文本上下文同问题之间的联系,使模型关注同问题相关的非遗实体,并建立非遗文本实体抽取模型ICHQA。【结果】将ICHQA模型在标注的非遗语料库中进行实证研究,并同相关基线模型进行对比,结果表明ICHQA在F1指标中表现最优,达87.139%。为凸显模型的优势和增强可解释性,本文还展开了消融实验并对模型输出进行了可视化。【局限】本文提出的模型仅在非遗语料库中进行验证,泛化性测试不够。【结论】利用机器阅读理解进行非遗实体抽取,能够有效利用实体标签的语义特征,提升实体抽取的效果。 [Objective] This paper proposes a Question-Answering(QA) model based on machine reading comprehension(MRC) to extract entities from Intangible Cultural Heritage(ICH) texts. [Methods] First, we constructed an ICH entity sensitive attention mechanism, which captured the interaction between contexts and questions. The mechanism also helps our model focus on questions and related ICH entities. Then, we built the ICHQA model for entity extraction. [Results] We examined the ICHQA model with the ICH corpus. The ICHQA’s F1 value reached 87.139%, which was better than the existing models. We also performed ablation studies and visualized outputs of the ICHQA. [Limitations] More research is needed to examine the proposed model with other corpus from digital humanities. [Conclusions] The proposed model could effectively extract ICH entities.
作者 范涛 王昊 张卫 李晓敏 Fan Tao;Wang Hao;Zhang Wei;Li Xiaomin(School of Information Management,Nanjing University,Nanjing 210023,China;Jiangsu Key Laboratory of Data Engineering and Knowledge Service,Nanjing 210023,China)
出处 《数据分析与知识发现》 CSSCI CSCD 北大核心 2022年第12期70-79,共10页 Data Analysis and Knowledge Discovery
基金 国家自然科学基金面上项目(项目编号:72074108) 南京大学文科青年跨学科团队专项(项目编号:010814370113) 江苏青年社科英才和南京大学“仲英青年学者”等人才培养计划的研究成果之一。
关键词 数字人文 非物质文化遗产 命名实体识别 注意力机制 机器阅读理解 Digital Humanities Intangible Cultural Heritage Named Entity Recognition Attention Mechanism Machine Reading Comprehension
  • 相关文献

参考文献5

二级参考文献64

  • 1李妮,关焕梅,杨飘,董文永.基于BERT-IDCNN-CRF的中文命名实体识别方法[J].山东大学学报(理学版),2020,55(1):102-109. 被引量:53
  • 2陈小荷,冯敏萱,徐润华,等.先秦文献信息处理[M].北京:世界图书出版公司北京公司,2013:146-168.
  • 3Sang E F T K, De Meulder F. Introduction to the CoNLL-2003 shared task: Language-independent named entity recognition [ C ]//Special Interest Group on Natural Language Learning of the Association for Computational Linguistics. Proceedings of the Sev- enth Conference on Natural Language Learning at HLT-NAACL. Edmonton: CONLL, 2003:142 - 147.
  • 4Busa R. The annals of humanities computing: The index thomistic- us[J]. Computers and the Humanities, 1980,14(2) :83 -90.
  • 5Unsworth J. What is humanities computing and what is not [ EB/ OL ]. [ 2015 - 03 - 26 ]. http ://computerphilologie. uni - muench en. de/jgO2/unsworth, html.
  • 6Lafferty J, McCallum A, Pereira F. Conditional random fields : Prob- abilistic models for segmenting and labeling sequence data [ C ]// The International Machine Learning Society. Proceedings of 18th International Conference on Machine Learning. Williamstown: Williams College, 2001:282 -289.
  • 7CRF++ [ EB/OL]. [ 2015 - 05 - 21 ]. http://sourceforge, net/ projects/crfpp/.
  • 8Jaynes E T. On the rationale of maximum entropy methods[ J]. In- stitute of Electrical and Electronics Engineers, 1982,70(9) :939 - 952.
  • 9Atterer M, Schiitze H. Prepositional phrase attachment without Ora- cles [ J ]. Computational Linguistics, 2007, 33 (4) :469 - 476.
  • 10衡中青,侯汉清.方志类古籍引书挖掘及其引书分析研究[J].中国索引,2008,6(2):22-29. 被引量:2

共引文献98

同被引文献7

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部