期刊文献+

基于元数据的科技论文分类方法 被引量:3

A METHODS BASED ON METADATA FOR TECHNICAL LITERATURE CATEGORIZATION
下载PDF
导出
摘要 随着信息技术和互联网的发展,以数字形式存储的科技论文数目急剧增加.如何对这些科技论文进行有效的分类变得十分迫切.本文针对科技论文是一种半结构化的文献这一特点,提出了一种利用论文中有限的元数据对论文进行分类的想法.实验证明,在只使用文章的元数据描述,如标题、关键词和摘要等信息时,可以取得与传统的基本文本全文信息分类方法近似的分类精度.在对以大量公式、字符为主的类别进行分类时,以元数据进行分类可以取得更加理想的效果.因元数据的尺寸远远小于全文的尺寸,该方法可极大地缩短分类的时间. With the high- speed development of the Intemet and information technology, the number of digital technical literatures increases rapidly. Therefore to categorize them efficiently becomes an urgent need. This paper proposes a method based on metadata for technical literatures. The experimetnts show that even if we only utilize the metadata of papers, e. g. the titles, abstract and keywords of papers, the classification precision of the algorithms based on the metadata has been very closed the traditional ones, which use the full-text of papers.Since the size of metadata of a paper is much less than the size of the full text, the new model can enhance the efficiency of document classfication greatly when categorizing those classes with large numbers of formulae and character.
出处 《山东师范大学学报(自然科学版)》 CAS 2008年第3期41-43,共3页 Journal of Shandong Normal University(Natural Science)
关键词 科技论文 文体分类 元数据 分类精度 technical literature text categorization metadata accuracy
  • 相关文献

参考文献5

  • 1Yang Y,Liu Xin.A re- examination of Text Categorization Methods. Proceeding of ACM SIGIR Conference on Research and Development in Information Retrieval( SIGIR ), 1999.42 - 49
  • 2Masao Fuketa, Sangkon Lee, Takako Tsuji, et al. A document classification method by using field association words [ J ]. Information Sciences, 2000, 126(1 - 4) :57 - 70
  • 3Marie- Francine Moens, Jos Dumortier. Text Categorization: The assignment of subject descriptors to magazing articles[J]. Information Processing & Management,2000,36(6) :841 -861
  • 4J Gary Auguston J,Jack Minker. An analysis of some graphtheoretical cluster techniques[J]. JACM, 1970,17(4) :571 - 588
  • 5史忠植.知识发现[M].北京:清华大学出版社,2000.

共引文献5

同被引文献31

  • 1胡卫华,朱永利.贝叶斯网络推理算法的研究和实现[J].华北电力大学学报(自然科学版),2004,31(5):63-65. 被引量:7
  • 2樊兴华,孙茂松.一种高性能的两类中文文本分类方法[J].计算机学报,2006,29(1):124-131. 被引量:70
  • 3白小明,邱桃荣.基于SVM和KNN算法的科技文献自动分类研究[J].微计算机信息,2006(12X):275-276. 被引量:10
  • 4倪茂树,时达明,林鸿飞.基于粗糙集属性约简的文本分类[J].郑州大学学报(理学版),2007,39(2):100-103. 被引量:7
  • 5张启蕊,董守斌,张凌.文本分类的性能评估指标[J].广西师范大学学报(自然科学版),2007,25(2):119-122. 被引量:7
  • 6金千里,赵军,徐波.弱指导的统计隐含语义分析及其在跨语言信息检索中的应用[C]//全国第七届计算语言学联合学术会议.北京:清华大学,2003-08-01:527-533.
  • 7KAZUAKI K.Technical issues of cross-language information retrieval:a review[J].Information Processing and Management,2005,41.433-455.
  • 8LI Kar-wing.A Corpus-based approach for cross-lingual information retrieval[D].Hong Kong:Department of Systems Engineering and Engineering Mangement,The Chinese University of Hong Kong,2004.
  • 9BI Wen-xia,WANG Ming-wen,LUO Yuan-sheng,el at.A new cross language text categorization based on interlingua semantic[J].Journal of Computational Information Systems,2008,4 (1):105-110.
  • 10WANG Ming-wen,YE Hao,HUANG Guo-bin,et al.A cross language retrieval model based on interlingua semantics[J].Journal of Computational Information Systems,2007,3(4):1555-1560.

引证文献3

二级引证文献12

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部