期刊文献+

基于深度学习与需求规则融合的学术文献“目标数据”抽取模型构建与应用——以南海数字资源为例 被引量:6

Combine Deep Learning with the Rules of Requirement to Construct the“Target Data”Extraction Model for the Academic Literature——Taking the Resources of the South China Sea as an Example
原文传递
导出
摘要 【目的/意义】从海量的学术文献内容中,抽取科研人员所需要的目标数据,一方面有助于提高研究者的科研效率,另一方面有利于改善目前文献数据库的检索服务。【方法/过程】根据科研人员的学术需求,首先通过深度学习方法从大量的学术文献中抽取目标数据。其次使用NER和TF-IDF抽取目标数据的"5W"规则,接着对目标数据做第二层需求规则过滤,凡是满足"5W"规则的数据,被鉴定为目标数据。最后对目标数据做第三层人工校验,最终生成学术文献"目标数据"。【结果/结论】本文构建的学术文献"目标数据"抽取模型的准确率可达0.88,再融合"5W"规则的过滤和最后的人工校验,不仅有利于提高科研工作者的学术文献查准率,而且一定程度上辅助文献数据库机构的检索工作。【创新/局限】深度学习与需求规则融合,实现学术文献的检索结果从学术文献的题录信息层面到进入学术文献内容的数据层面。 【Purpose/significance】Extracting the target data needed by researchers from the massive academic literature content is conductive to improve the research efficiency of researchers on the one hand, and improve the retrieval services of current literature databases on the other hand.【Method/process】According to the academic needs of scientific researchers, first extract target data from a large number of academic documents through deep learning methods. Secondly, NER and TF-IDF are used to extract the "5 W" rule of the target data, and then the target data is filtered by the second-level requirement rule. Any data that meets the "5 W" rule is identified as the target data. Finally, the third layer of manual verification is performed on the target data, and the academic literature "target data" is finally generated.【Result/conclusion】The accuracy rate of the "target data" extraction model for academic literature constructed in this paper can reach 86.6%, and the integration of "5 W" rule filtering and final manual verification will not only improve the academic literature search of scientific researchers accuracy rate, and to a certain extent assist the retrieval work of literature database institutions.【Innovation/limitation】Combine the deep learning with rules of requirement to realize the search results of academic literature from the bibliographic information level of academic literature to the data level of the content of academic literature.
作者 彭玉芳 陈将浩 PENG Yu-fang;CHEN Jiang-hao(School of Economics&Management,Nanjing Institute of technology,Nanjing 211167,China;Department of Information Management,Nanjing University,Nanjing 210046,China;School of mathematical sciences,University of Science and Technology of China,Hefei 230026,China)
出处 《情报科学》 CSSCI 北大核心 2022年第1期141-147,157,共8页 Information Science
基金 国家社会科学基金重大项目“南海疆文献资料整理中的知识发现与维权证据链建构研究”(19ZDA347) 南京大学2015年度研究生创新工程“跨学科科研创新基金”项目“民国档案文献中的环中国南海文化电函与报道研究”(2015CW04) 江苏省研究生培养创新工程项“基于自动关联技术的南海问题证据链研究”(KYLX15_0025)。
关键词 深度学习 命名实体识别 词袋模型 TF-IDF “5W”规则 deep learning named entity recognition bag-of-words model TF-IDF "5W"rule
  • 相关文献

参考文献22

二级参考文献107

共引文献246

同被引文献132

引证文献6

二级引证文献11

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部