期刊文献+

基于指代消解的引文内容抽取研究

Extracting Citation Contents with Coreference Resolution
原文传递
导出
摘要 【目的】为改善手动或简单的引文提取方法,提高引文内容分析效果,应精确抽取引文内容。【方法】将引文内容抽取任务具体分为引文句、引文上下文、引文元数据三部分,基于指代消解理论,利用机器学习和层次过滤法对引文上下文进行抽取。【结果】实验数据收集了顺序编码制的中文期刊文献,结果证实该方法抽取引文句并解析参考文献结果正确无误,识别引文上下文的F1值为0.780~0.849。【局限】缺乏中文科学引文语料资源,实验数据选择人工标注小规模数据集,跨域能力有限,不可避免存在文本领域依赖的缺陷。【结论】本研究能够优化和扩大引文内容分析的步骤和范围,为使用引文内容分析法的相关研究者提供参考。 [Objective] This paper aims to accurately extract scientific citations and their context data, which significantly improves the results of citation analysis. [Methods] We divided the citation extraction task into citation sentence extraction, citation context identification, and citation metadata. Then, we proposed a coreference resolution-based method to identify and extract scientific citation context. [Results] We examined our method with the Chinese sequential coding periodicals and extracted the citation sentences and references correctly. The F1 value for identifying the citation context was between 0.780 and 0.849. [Limitations] Due to the limits of Chinese scientific citation corpus and the small scale of experimental data, the proposed method might not work effectively in other fields. [Conclusions] Our study optimizes the steps of citation content analysis and enlarges data scope. It provides support for researchers of citation content analysis.
作者 谭荧 唐亦非 Tan Ying;Tang Yifei(School of Public Administration,Hubei University,Wuhan 430062,China;School of Information Management,Central China Normal University,Wuhan 430079,China)
出处 《数据分析与知识发现》 CSSCI CSCD 北大核心 2021年第8期25-33,共9页 Data Analysis and Knowledge Discovery
基金 国家社会科学基金重大项目(项目编号:19ZDA345)的研究成果之一。
关键词 信息抽取 指代消解 引文内容 引文上下文 Information Extraction Coreference Resolution Citation Content Citation Context
  • 相关文献

参考文献2

二级参考文献35

  • 1王恺荣,程晓琳.引文行为和引文动机研究[J].现代情报,2005,25(3):17-18. 被引量:4
  • 2许德山.科技论文引用中的观点倾向分析[D].北京:中国科学院文献情报中心,2012.
  • 3杨杰明.文本分类中文本表示模型和特征选择算法研究[D].吉林大学,2013.6.
  • 4刘盛博,丁堃.基于引用内容的引文评价分析[C]//第九届中国科技政策与管理学术年会论文集,2013.
  • 5ABU-JBARA A, EZRA J, RADEV D R. Purpose and polarity of citation : towards NLP-based bibliometrics [ C ]//Proceedings of the 2013 conference of the North Americar~ Chapter of the Association for Computational Linguistics: human language technologies. At- lanta: Association for Computational Linguistics, 2013: 596- 606.
  • 6COLLINS H M. The TEA set: tacit knowledge and scientific net- works[J]. Social studies of science, 1974, 4(2) : 165 -185.
  • 7CANO V. Citation behavior: classification, utility, and location [ J]. Journal of the American Society for Information Science, 1989, 40(4) : 284 -290.
  • 8CHUBIN D E, MOITRA S D. Content analysis of references: ad- junct or alternative to citation counting? [ J]. Social studies of sci- ence, 1975, 5(4) :423 -441.
  • 9NANBA H, OKUMURA M. Towards Multi-paper summarization u- sing reference information[ C]// Proceedings of The 1999 Interna- tional Joint Conference on Artificial Intelligence. Stockholm: AAAI, 1999 : 926 - 931.
  • 10ABU -JBARA A, RADEV D. Coherent citation-based summariza-tion of scientific papers [ C ]//Proceedings of the 49th annual meet- ing of the Association for Computational Linguistics: human lan- guage technologies-volume I. Portland: Association for Computa- tional Linguistics, 2011 : 500 -509.

共引文献23

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部