英文科技文献内核识别方法研究

Research on Recognition of Core Content of English Scientific Literature

下载PDF

导出

摘要针对英文科技文献的特征,提出一种规则和统计相结合的关键内容识别方法。该方法首先通过对源文档进行特征标识,将其转换成更易于处理的中间文档;然后利用特征还原、线索词匹配、主题识别和临近分析等,从中间文档抽取代表文本的主要信息,生成目标文档。该方法能够有效地辅助科研人员阅读大量的英文科技文献,提高阅读效率。 Based on the features of the English scientific literatures, this paper proposes a method of combining rules with sta- tistics to recognize key content. The method firstly recognizes the features of the source document and turns it into the intermediary document which can be processed more easily. Then, through features recovery, clue word matching, topic recognition and proxi- mal analysis, the method creates the target document by extracting the main information representing the document from the inter- mediary document. The method can effectively help the scientific research personnel read lots of English scientific literatures and improve their reading efficiency.

作者祝清松冷伏海王林韩涛

机构地区中国科学院国家科学图书馆中国科学院研究生院

出处《情报理论与实践》 CSSCI 北大核心 2012年第9期112-116,共5页 Information Studies:Theory & Application

基金国家自然科学基金项目“科技创新演化分析理论与方法研究”(项目编号:70873123) 中国科学院文献情报新增能力项目“面向‘未来科技竞争力’分析方法和工具研究”的成果

关键词特征标识线索词匹配主题识别临近分析 feature recognition clue word matching topic recognition proximal analysis

分类号 G351 [文化科学—情报学]

引文网络
相关文献

参考文献10

1PDF to word converter [ EB/OL]. [2011-10-11 ]. http: // www. soliddocuments. com.
2泰比(ABBYY)FineReader 11 [ EB/OL]. [2011-10-11 ].http: //www. abbyy. cn.
3王立学.基于文本结构解析的动态DT方法及其实现研究[D].北京:中国科学院,2010.
4刘建华,张智雄,徐健,许雁冬.自动术语识别--对科技文献进行文本挖掘的重要技术方法[J].现代图书情报技术,2008(8):12-17. 被引量：12
5FRANTZI K, ANANIADOU S, MIMA H. Automatic recognition of multi-word terms [ J ]. International Journal of Digital Libraries, 2000, 3 (2): 117-132.
6KOSTOFF R N, EBERHART H J, TOOTHMAN D R. Hypersonic and supersonic flow roadmaps using bibliometrics and database tomography [ J]. Journal of the American Society for Information Science, 1999, 50 (5) : 427-447.
7KOSTOFF R N, EBERHART H J, TOOTHMAN D R. Database tomography for technical intelligence: comparative roadmaps of the research impact assessment literature and the journal of the American chemical society [ J ]. Scientometrics, 1997, 40 (1) : 103-148.
8NaCTeM. Termine Web service [EB/OL]. [2011-10-12]. http: //www. nactem. ac. uk/software/termine/webserviee.
9刘晓勇.基于语义关系挖掘的隐性关联知识发现研究[D].北京:中国科学院,2011.
10MEADOR M A, FILES B, LI Jing, et al. Draft nanotechnology roadmap technology area 10 [ EB/OL]. [2011-11-06]. http: //www. nasa. gov/pdf/501325main _ TA10-Nanoteeh-DRAFT-Nov2010-A. pdf.

二级参考文献22

1Feldman R, Fresko M, Kinar Y, et al. Text Mining at the Term Level [ J ]. Lecture Notes In Computer Science, 1998:65 - 73.
2Mima H, Ananiadou S, Nenadic G. The ATRACT Workbench:Automatic Term Recognition and Clustering for Terms [ J ]. Lecture Notes in Computer Science, 2001,2166:126 - 133.
3Milios E, Zhang Y, et al. Automatic Term Extraction and Document Similarity in Special Text Corpora[ C]. In: Proceeding of the 6th conference of the Paciftc Association for Computational Linguistics, New York : ACM, 2003:275 - 284.
4Love S. Benchmarking the Performance of Two Automated Term - Extraction Systems:LOGOS and ATAO [ EB/OL]. [ 2008 - 04 - 03 ]. http ://www. olst. umontreal, ca/pdf/memoirelove, pdf.
5Kajikawa Y, Sugiyama Y. Causal Knowledge Extraction by Natural Language Processing in Material Science:A Case Study in Chemical Vapor Deposition [ J ]. Data Science Journal, 2006,5 : 108 - 118.
6Jensen L J, Saric J, Bork P. Literature Mining for the Biologist: from Information Retrieval to Biological Discovery[ J]. Nature Reviews ( Genetics), 2006,7 : 119 - 129.
7Krauthammer M, Nenadic G. Term Identification in the Biomedical Literature[ J ]. Journal of Biomedical Informatics, 2004,37 ( 6 ) : 512 -526.
8Asuncion Gomez - Porez, David Manzano - MachoA Survey of Ontology Learning Methods and Techniques [ EB/OL]. [ 2008 - 06 - 05 ]. http://www, sti - innsbruck, at/fileadmin/documents/deliverables/Ontoweb/D1.5, pdf.
9Term versus Word [ EB/OL]. [ 2008 - 02 - 24 ]. http ://www. termiumplus, gc. co/didacticiel_tutofial/english/lessonl/pagel _2 _4_ e. html.
10Alegria I, Arregi O, Baiza I. Linguistic and Statistical Approaches to Basque Term Extraction [ EB/OL ]. [ 2008 - 2 - 24 ]. http:// ixa. is. ehu. es.

共引文献12

1李楠,郑荣廷,吉久明,滕青青.基于启发式规则的中文化学物质命名识别研究[J].现代图书情报技术,2010(5):13-17. 被引量：12
2陈宇,朱建锋,吴毅坚,赵文耘.一种基于领域本体的新术语扩充方法[J].计算机工程,2011,37(7):24-27. 被引量：8
3祝清松,冷伏海.自动术语识别存在的问题及发展趋势综述[J].图书情报工作,2012,56(18):104-109. 被引量：16
4钟丽萍.情报研究有效性评价的国内外研究现状及评述[J].情报杂志,2012,31(10):32-35. 被引量：4
5王卫民,贺冬春,符建辉.基于种子扩充的专业术语识别方法研究[J].计算机应用研究,2012,29(11):4105-4107. 被引量：6
6周杰,丁遒劲,吴雯娜,曾建勋.网络环境下国家叙词库的构建研究[J].图书情报工作,2013,57(16):5-10. 被引量：2
7颜端武,李兰彬,曲美娟.基于N-gram复合分词的领域概念自动获取方法研究[J].情报理论与实践,2014,37(2):122-126. 被引量：5
8余恒,崔辰州,张晖.天文学英语新词自动提取系统[J].天文研究与技术,2015,12(3):374-380. 被引量：2
9周丽英,冷伏海,左文革.引文耦合增强的共词分析方法改进研究——以ESI农业科学研究主题划分为例[J].情报理论与实践,2015,38(11):120-125. 被引量：13
10江启煜,郑美思,李红毅,梁家芬.基于条件随机域的禤国维名老中医医案挖掘分析[J].中国实验方剂学杂志,2017,23(9):208-213. 被引量：4

1符静.我国参考咨询馆员研究进展[J].图书馆学刊,2013,35(12):121-123.
2陈荣锦.英文科技文献出版类型的识别方法[J].图书馆,1983(4):46-52.
3叶春蕾,冷伏海.基于概率模型的主题识别方法实证研究[J].情报科学,2013,31(2):135-139. 被引量：9
4刘丹.自由文本数据库中文献检索的新特征标识方法[J].情报理论与实践,1994,17(5):36-38.
5叶春蕾,冷伏海.基于引文—主题概率模型的科技文献主题识别方法研究[J].情报理论与实践,2013,36(9):100-103. 被引量：17
6王曰芬,傅柱,陈必坤.基于LDA主题模型的科学文献主题识别:全局和学科两个视角的对比分析[J].情报理论与实践,2016,39(7):121-126. 被引量：18
7王振蒙.国内外关联数据研究热点对比分析[J].知识管理论坛,2016(3):163-173. 被引量：3
8周子明.一种规则和统计相结合的文本主题识别[J].中国电子商务,2011(4):79-79.
9祝清松,冷伏海.基于引文内容分析的高被引论文主题识别研究[J].中国图书馆学报,2014,40(1):39-49. 被引量：108
10张磊.档案学高被引论文引用情感类型分析[J].档案管理,2014(4):22-23. 被引量：3

情报理论与实践

2012年第9期

浏览历史

内容加载中请稍等...

英文科技文献内核识别方法研究

参考文献10

二级参考文献22

共引文献12

相关作者

相关机构

相关主题

浏览历史