期刊文献+

英文科技文献内核识别方法研究

Research on Recognition of Core Content of English Scientific Literature
下载PDF
导出
摘要 针对英文科技文献的特征,提出一种规则和统计相结合的关键内容识别方法。该方法首先通过对源文档进行特征标识,将其转换成更易于处理的中间文档;然后利用特征还原、线索词匹配、主题识别和临近分析等,从中间文档抽取代表文本的主要信息,生成目标文档。该方法能够有效地辅助科研人员阅读大量的英文科技文献,提高阅读效率。 Based on the features of the English scientific literatures, this paper proposes a method of combining rules with sta- tistics to recognize key content. The method firstly recognizes the features of the source document and turns it into the intermediary document which can be processed more easily. Then, through features recovery, clue word matching, topic recognition and proxi- mal analysis, the method creates the target document by extracting the main information representing the document from the inter- mediary document. The method can effectively help the scientific research personnel read lots of English scientific literatures and improve their reading efficiency.
出处 《情报理论与实践》 CSSCI 北大核心 2012年第9期112-116,共5页 Information Studies:Theory & Application
基金 国家自然科学基金项目“科技创新演化分析理论与方法研究”(项目编号:70873123) 中国科学院文献情报新增能力项目“面向‘未来科技竞争力’分析方法和工具研究”的成果
关键词 特征标识 线索词匹配 主题识别 临近分析 feature recognition clue word matching topic recognition proximal analysis
  • 相关文献

参考文献10

  • 1PDF to word converter [ EB/OL]. [2011-10-11 ]. http: // www. soliddocuments. com.
  • 2泰比(ABBYY)FineReader 11 [ EB/OL]. [2011-10-11 ].http: //www. abbyy. cn.
  • 3王立学.基于文本结构解析的动态DT方法及其实现研究[D].北京:中国科学院,2010.
  • 4刘建华,张智雄,徐健,许雁冬.自动术语识别--对科技文献进行文本挖掘的重要技术方法[J].现代图书情报技术,2008(8):12-17. 被引量:12
  • 5FRANTZI K, ANANIADOU S, MIMA H. Automatic recognition of multi-word terms [ J ]. International Journal of Digital Libraries, 2000, 3 (2): 117-132.
  • 6KOSTOFF R N, EBERHART H J, TOOTHMAN D R. Hypersonic and supersonic flow roadmaps using bibliometrics and database tomography [ J]. Journal of the American Society for Information Science, 1999, 50 (5) : 427-447.
  • 7KOSTOFF R N, EBERHART H J, TOOTHMAN D R. Database tomography for technical intelligence: comparative roadmaps of the research impact assessment literature and the journal of the American chemical society [ J ]. Scientometrics, 1997, 40 (1) : 103-148.
  • 8NaCTeM. Termine Web service [EB/OL]. [2011-10-12]. http: //www. nactem. ac. uk/software/termine/webserviee.
  • 9刘晓勇.基于语义关系挖掘的隐性关联知识发现研究[D].北京:中国科学院,2011.
  • 10MEADOR M A, FILES B, LI Jing, et al. Draft nanotechnology roadmap technology area 10 [ EB/OL]. [2011-11-06]. http: //www. nasa. gov/pdf/501325main _ TA10-Nanoteeh-DRAFT-Nov2010-A. pdf.

二级参考文献22

  • 1Feldman R, Fresko M, Kinar Y, et al. Text Mining at the Term Level [ J ]. Lecture Notes In Computer Science, 1998:65 - 73.
  • 2Mima H, Ananiadou S, Nenadic G. The ATRACT Workbench:Automatic Term Recognition and Clustering for Terms [ J ]. Lecture Notes in Computer Science, 2001,2166:126 - 133.
  • 3Milios E, Zhang Y, et al. Automatic Term Extraction and Document Similarity in Special Text Corpora[ C]. In: Proceeding of the 6th conference of the Paciftc Association for Computational Linguistics, New York : ACM, 2003:275 - 284.
  • 4Love S. Benchmarking the Performance of Two Automated Term - Extraction Systems:LOGOS and ATAO [ EB/OL]. [ 2008 - 04 - 03 ]. http ://www. olst. umontreal, ca/pdf/memoirelove, pdf.
  • 5Kajikawa Y, Sugiyama Y. Causal Knowledge Extraction by Natural Language Processing in Material Science:A Case Study in Chemical Vapor Deposition [ J ]. Data Science Journal, 2006,5 : 108 - 118.
  • 6Jensen L J, Saric J, Bork P. Literature Mining for the Biologist: from Information Retrieval to Biological Discovery[ J]. Nature Reviews ( Genetics), 2006,7 : 119 - 129.
  • 7Krauthammer M, Nenadic G. Term Identification in the Biomedical Literature[ J ]. Journal of Biomedical Informatics, 2004,37 ( 6 ) : 512 -526.
  • 8Asuncion Gomez - Porez, David Manzano - MachoA Survey of Ontology Learning Methods and Techniques [ EB/OL]. [ 2008 - 06 - 05 ]. http://www, sti - innsbruck, at/fileadmin/documents/deliverables/Ontoweb/D1.5, pdf.
  • 9Term versus Word [ EB/OL]. [ 2008 - 02 - 24 ]. http ://www. termiumplus, gc. co/didacticiel_tutofial/english/lessonl/pagel _2 _4_ e. html.
  • 10Alegria I, Arregi O, Baiza I. Linguistic and Statistical Approaches to Basque Term Extraction [ EB/OL ]. [ 2008 - 2 - 24 ]. http:// ixa. is. ehu. es.

共引文献12

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部