期刊文献+

基于支持向量机和核心特征词的科技文献自动标引研究 被引量:5

Research on the S & T Literature Automatic Indexing Based on Support Vector Machine and Core Feature Word
下载PDF
导出
摘要 科技文献通常包括研究目的、方法、结果和结论等信息,如何将科技文献标引上这些信息,帮助科研人员在数量巨大的文献中快速发现符合研究需要的内容显得尤为重要。文章在研究分析科技文献写作特点基础上,提出了基于词、英文(专有名词、缩写词)以及数字的核心特征词提取策略;然后将科技文献标引问题转化为句子分类问题,结合提出的核心特征词,采用支持向量机分类器对科技文献进行句子级别的语义标引。通过对1168篇糖尿病医学类论文实验,证明本文提出的方法能够有效地学习和标引科技文献中的句子,进而有效地对科技文献关键信息点进行自动标引。 S T literature usually includes research purpose,methods,results and conclusion. How to index S T literature of the above information and help scientific research personnel quickly find the research needs in a huge number of literatures is particularly important. Based on the research and analysis of S T literature writing characteristics,the paper puts forward core feature word selecting strategy on the basis of word,English( proper nouns,abbreviation) and digital. Then,the paper transforms the S T literature indexing problem into sentence classification problem. Combined with the proposed core feature word,the paper adopts the support vector machine classifier for sentence- level semantic indexing of S T literature. Based on experiments of 1168 diabetes medical papers,the paper proves that the proposed method can effectively learning and indexing the sentences in S T literature,thus effectively carries on the automatic indexing for key points of S T literature.
出处 《情报理论与实践》 CSSCI 北大核心 2014年第7期129-134,共6页 Information Studies:Theory & Application
基金 国家社会科学基金项目"学术文献‘意抄’检测研究"(项目编号:12CTQ032) 山东省自然科学基金项目"大规模学术文献并行处理与语义分类研究"(项目编号:ZR2011GL025)的成果之一
关键词 自动标引 支持向量机 特征提取 科技文献 automatic indexing support vector machine feature selection S & T literature
  • 相关文献

参考文献13

  • 1GRAETZ N. Teaching EFL students to extract structural infor- mation from abstracts : reading for professional purposes : meth- ods and materials in teaching languages [ M]. Leuven: Acco, 1985.
  • 2TEUFEL S, CARLETI'A ], MOENS M. An annotation scheme for discourse-level argumentation in research articles: proceed- ings of the Ninth Conference on European Chapter of the Associ- ation for Computational Linguistics [ C ]. Stroudsburg, PA, USA : Association for Computational Linguistics, 1999 : 110 -117.
  • 3LIAKATA M, SAHA S, DOBNIK S, et al. Automatic recogni- tion of conceptualization zones in scientific articles and two life science applications [ J]. Bioinformatics, 2012, 28 (7): 991-1000.
  • 4GUO Y, S1LINS I, REICHART R, etal. CRAB reader: atool for analysis and visualization of argumentative zones in scientific.literature: COLING 2012, 24th International Conference on Computational Linguistics [ C ]. Mumbai, India : Indian In- stitute of Technology Bombay, 2012: 183-190.
  • 5GUO Y, KORHONEN A, LIAKATA M, et al. Identifying the information structure of scientific abstracts: an investigation of three different schemes: proceedings of the 2010 Workshop on Biomedical Natural Language Processing [ C ]. Stroudsburg, PA, USA: Association for Computational Linguistics, 2010: 99-107.
  • 6HIROHATA K, OKAZAKI N, ANANIADOU S, et al. Identif- ying sections in scientific abstracts using conditional random fields: proceedings of the 3rd International Joint Conference on Natural Language Processing [ C]. Hyderahad, India: Asian Federation of Natural Language processing, 2008: 381-388.
  • 7RUCH P, BOYER C, CHICHESTER C, et al. Using argu- mentation to extract key sentences from biomedical abstracts [J]. International Journal of Medical Informatics, 2007, 76 (2/3) : 195-200.
  • 8XU R, SUPEKAR K, HUANG Y, et al. Combining text clas- sification and Hidden Markov Modeling techniques for categori- zing sentences in randomized clinical trial abstracts. AMIA An- nual Symposium Proceedings [ C ]. Washington, D.C. : AMIA, 2006: 824-828.
  • 9MCKNIGHT L, SRINIVASAN P. Categorization of sentence types in medical abstracts: AMIA Annual Symposium Proceed- ings [ C ]. Washington, D.C. , USA: AMIA, 2003: 440-444.
  • 10DE WAARD A, BUITELAAR P, EIGNER T. Identifying the epistemic value of discourse segments in biology texts : proceed- ings Eighth International Conference on Computational semantics [ C ]. Stroudsburg, PA, USA : Association for Computational Linguistics, 2009: 351-354.

同被引文献220

引证文献5

二级引证文献41

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部