期刊文献+

中文文献引文情感语料库构建 被引量:14

Corpus Construction for Citation Sentiment in Chinese Literature
下载PDF
导出
摘要 基于内容的引文情感分析克服了传统基于引用频次的引用同一化问题,是引文内容分析领域一个重要的研究热点。然而引文情感分析依赖于带标注的数据集,目前大规模高质量的引文情感语料资源匮乏,严重制约了该领域的研究。因此,本文在分析引文情感表达方式的基础上提出了一套适用于引文情感表示的标注体系,并详细阐述了语料库建设的技术和方法。采用人机结合的标注策略,借助完善的引文标注系统,构建了规模较大的中文文献的引文情感语料库。统计结果显示,在中文信息处理和科技管理领域情感褒义和贬义总的引用的占比分别为22%和6%,引文情感标注kappa值达到0.852,表明该语料库能够客观地反映作者的情感倾向性,可为论文评价、引文网络分析和情感分析等相关领域的研究提供数据支撑。 A content-based citation sentiment analysis overcomes the traditional problem of frequency-based citation assimilation, which is an important research hotspot in the field of citation content analysis. However, citation sentiment analysis relies on annotated datasets, and the lack of a large-scale and high-quality citation sentiment corpus seriously restricts research progress in this field. Therefore, based on the analysis of citation sentiment expression, a set of annotation schemes for such expression is proposed in this paper, along with elaboration regarding the technology and method of corpus construction. A large-scale citation sentiment corpus on Chinese literature was constructed using the human-computer interaction annotation strategy through a comprehensive citation annotation system. The statistical results show the proportions of positive and negative citations as 22% and 6%, respectively, and the kappa value of citation sentiment reached0.852, indicating that this corpus objectively reflects the authors sentiments and can provide data support for research in related fields such as paper evaluation, citation network analysis, and sentiment analysis.
作者 徐琳宏 丁堃 陈娜 李冰 Xu Linhong;Ding Kun;Chen Naand Li Bing(WISE Lab,Institute of Science of Science and Technology Management,Dalian University of Technology,Dalian 116024;Software Institute,Dalian University of Foreign Languages,Dalian 116044)
出处 《情报学报》 CSSCI CSCD 北大核心 2020年第1期25-37,共13页 Journal of the China Society for Scientific and Technical Information
基金 国家自然科学基金项目“基于引用极性和评论挖掘的论文综合评价模型研究”(61772103),“面向社交媒体的多语种文本情感分析方法研究”(61806038)
关键词 引文情感分析 一致性检验 标注体系 citation sentiment analysis consistency validation annotation scheme
  • 相关文献

参考文献12

二级参考文献164

共引文献280

同被引文献369

引证文献14

二级引证文献85

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部