期刊文献+

基于概念层次的英文文本自动分类研究 被引量:3

Research on Automatic Text Classification Methods Based on Concept Hierar chies
下载PDF
导出
摘要 该文意在设计并且实现一个针对英文文本的自动归类以及检索系统,重点在于提高分类方法的准确率。自动文本分类系统中,一般来说文本内容是以N维特征空间的形式存储的,所以特征提取的方法和准确率极大地影响到分类结果的正确率。传统方法是基于词形的,并不考察词语的意义,忽略了同一意义下词形的多样性、不确定性以及词义之间的关系,尤其是上下位关系。该文提出的方法,在向量空间模型(VSM)的基础上,以“概念”为基础,同时考虑词义的上位关系,使得训练过程中可以从词语中提炼出更加概括性的信息,从而达到提高分类精度的目的。 This paper aims at designing and implementing an automatic classification and retrieval system for English documents,focusing on improving the result of the classification algorithm.The documents in an automatic text classification sys tem are represented by feature vectors,and the overall performance is dependent on the algorithm and its accuracy of feature selection.Conventional word-fo rm based automatic classification systems ignore all semantic information of th e words,so the diversity and indeterminacy of word-forms will harm the result .This paper proposes a new feature extraction algorithm,which is based on the Vector Space Model,and uses concepts as features,giving further consideration to the concepts' inter-phrase relativity,especially the hypernymy.The algori thm enables the extraction of more abstract concepts of a text,and thus improve s the classification result.
出处 《计算机工程与应用》 CSCD 北大核心 2004年第11期75-77,共3页 Computer Engineering and Applications
关键词 自动文本分类 概念层次 VSM WORDNET Automatic text classification,Concepts hierarchy,VSM,WordNet
  • 引文网络
  • 相关文献

参考文献1

二级参考文献8

  • 1Grishman R,Macleod C,Meyers A.COMPLEX syntax:building a computational lexicon[C].In: Proceedings of COLING-94,1994
  • 2DeJong G.Fast Skimming of News Stories:The FRUMP System[D].PhD thesis. 1978
  • 3Edmundson H P.New methods in automatic extraction[J].Journal of the ACM, 1968; 16(2)
  • 4Kupiec J,Pedersen J,Chen F.A trainable document summarizer[C].In:Proceedings of the Eighteenth Annual International ACM Conference on Research and Development in Information Retrieval(SIGIR),1995
  • 5郭玉箐,张旭平,罗振声.自动文摘中统计信息与文本结构自动分析初探[C].In:International Conference on Machine Translation & Computer Language Information Processing,1999
  • 6WAN Min,LUO Zhensheng,GUO Yuqing. Study on semantic paragraph partition in automatic abstracting system[C].In:Natural Language Processing and Knowledge Engineering(NLPKE)Mini Symposium of the 2001 IEEE International Conference on Systems, Man,and Cybernetics(SMC2001) ,2001
  • 7Lin. Knowledge-based automatic topic identification[J].Information Processing and Management , 1997; 26 (1)
  • 8郭玉箐,万敏,罗振声.面向非受限领域的综合式自动中文文摘方法[J].清华大学学报(自然科学版),2002,42(1):139-142. 被引量:10

共引文献8

同被引文献54

  • 1徐妙君,顾沈明.面向Web的文本挖掘技术研究[J].控制工程,2003,10(z1):44-46. 被引量:4
  • 2杨斌,孟志青.一种文本分类数据挖掘的技术[J].湘潭大学自然科学学报,2001,23(4):34-37. 被引量:10
  • 3郑海,林鸿飞.基于段落匹配的文本分类机制[J].计算机工程与应用,2004,40(28):174-176. 被引量:3
  • 4DavidHand HeikkiMarmila PadhraicSmyth 张银奎 廖丽 宋俊译.数据挖掘原理[M].机械工业出版社,2003..
  • 5TomMMitchell.机器学习[M].北京:机械工业出版社,2003.263-276.
  • 6刘群 张华平 俞鸿魁.基于层次隐马模型的汉语词法分析[Z].,2003..
  • 7Salton G,Wong A,Yang C Sa. Vector Space Model for Automatic Indexing [J]. Communications of the ACM, 1975,18(5 ) : 613-620.
  • 8Bray T, Paoli J, Sperberg-McQaeen C M, Extcnsible Markup Language (XML) 1,0 Specification [EB/OL]. World Wide Web Consortium Recommendation, http://www.w3.org/TR/REC-xml,1998.
  • 9Lassila O, Swick R R. Resource Description Framework Model and Syntax Specification [ EB/OL]. Workt Wide Web Consortium Recommendation, http ://www. w3. org/TR/REC-rdf-syntax/, 1999.
  • 10Koller D, Sahami M. Hierarchically Classifying Documents Using Very Few Words[J]. ICML'97, 1997, 170-178.

引证文献3

二级引证文献59

相关主题

;
使用帮助 返回顶部