期刊文献+

基于概念向量空间的文档语义分类模型研究 被引量:3

Semantic Classification Model of Documents Based on Concept Vector Space
原文传递
导出
摘要 针对传统文档自动分类方法和目前语义分类方法中存在的问题,提出一种新的基于概念向量空间的文档语义分类模型,该模型通过字符匹配算法将原文档高维词向量空间中相互独立的词项匹配到描述本体概念的属性集合,进而映射成属性集合对应的本体概念,形成低维的、语义丰富的文档概念向量空间。采用目前非常流行的数据集"20Newsgroups"作为实验数据集,对基于概念向量空间的文档语义分类模型进行实验验证。实验结果表明:提出的文档语义分类方法与传统基于词向量空间的文档分类方法相比,能够极大地降低向量空间维度,提高文档分类的性能。 For solving the existing problems in the traditional text classification methods and the current semantic classification methods, this paper proposes a new semantic classification model of documents based on concept vector space. This model utilizes character-based matching algorithm to match words in word vector space of documents with attribute sets of ontology concepts, if words are exist in attribute sets. Then it replaces words with ontology concepts corresponding to attribute sets, thus the concept vector space with the lower dimensionality and abundant semantics is formed. The paper takes the "20Newsgroups" as experimental datasets and carries out a semantic classification experiment of documents. Experimental results show that the proposed method can greatly decrease the dimensionality of vector space and improve the text classification performance.
作者 李海蓉
出处 《图书情报工作》 CSSCI 北大核心 2011年第24期106-111,26,共7页 Library and Information Service
关键词 概念向量空间 文档自动分类 文档语义分类 模型 concept vector space automatic classification of documents semantic classification of documents model
  • 相关文献

参考文献13

  • 1Bloehdom S, Hotho A. Boosting for text classification with semantic features//Proceedings of 6th International Workshop on Knowledge Discovery on the Web, WebKDD 2004. LNAI 3932. Berlin: Springer-Verlag, 2006:149 - 166.
  • 2Mitra V, Wang C J, Banerjee S. A neuro-SVM model for text classification using latent semantic indexingc//Proceedings of the International Joint Conference on Neural Networks, IJCNN 2005. Montreal:2005:564 -569.
  • 3Marina L, Mark L, Slava K. Classification of Web documents using concept extraction from ontologies//Proceedings of the 2rid International Workshop Autonomous Intelligent Systems: Agents and Data Mining, AIS-ADM 2007. LNAI 4476. Heidelberg : Springer- Verlag, 2007:287 - 292.
  • 4Carpineto C, Michini C, Nicolussi R. A concept lattice-based kernel for svm text classification//Proceedings of Formal Concept Analysis-7th International Conference, ICFCA 2009. LNAI 5548. Heidelberg: Springer-Verlag, 2009:237 - 250.
  • 5张剑,李春平.基于WordNet概念向量空间模型的文本分类[J].计算机工程与应用,2006,42(4):174-178. 被引量:16
  • 6胡泽文,王效岳,白如江.基于SUMO和WordNet本体集成的文本分类模型研究[J].现代图书情报技术,2011(1):31-38. 被引量:8
  • 7RapidMiner. [ 2011 - 06 - 15 ]. http://rapid-i, corn/content/ view/133/66/.
  • 8马范玲,胡泽文.基于SUMO本体的图书自动分类模型研究[J].情报杂志,2011,30(1):168-173. 被引量:8
  • 9郝占刚,王正欧.基于潜在语义索引和遗传算法的文本特征提取方法[J].情报科学,2006,24(1):104-107. 被引量:16
  • 10Jena-A semantic web framework for java. [2011 -03 - 15 ]. http ://jena. soureeforge, net.

二级参考文献57

  • 1刘晓婷,鹿蕾.图书自动分类系统非精确推理模型的研究与设计[J].现代电子技术,2005,28(24):69-72. 被引量:1
  • 2郝占刚,王正欧.基于潜在语义索引和遗传算法的文本特征提取方法[J].情报科学,2006,24(1):104-107. 被引量:16
  • 3张剑,李春平.基于WordNet概念向量空间模型的文本分类[J].计算机工程与应用,2006,42(4):174-178. 被引量:16
  • 4白如江.基于粗糙集和RBF神经网络的文本自动分类方法[J].现代图书情报技术,2006(6):47-51. 被引量:3
  • 5Shankar S,Karypis G.Weight adjustment schemes for a centroid based classifier[R].Computer Science Technical Report TR00-035 ,Department of Computer Science,University of Minnesota,Minneapolis,Minnesota, 2000.
  • 6Yang Y.An Evaluation of Statistical Approaches to Text Category[J]. Journal of Information Retrieval, 1999 ; 1 (1/2) :67-88.
  • 7Cairo R A,Partridge M.A Comparative Study of Principal Component Analysis Techniques[C].In:Proe Ninth Australian Conf On Neural Networks, Brisbane, QLD, 1998.
  • 8Deerwester S,Dumais S T,Furnas G W et al.Indexing by Latent Semantic Analysis[J].Joumal of the American Society for Information Science, 1990;41 (6) :391-407.
  • 9Dumais S T.Using LSI for information filtering:TREC-3 experiments[C]. In : Proc of the Third Text Retrieval (TREC-3), National Institute of Standards and Technoloy, 1995.
  • 10Karypis G,Han E H.Concept indexing;A fast dimensionality reduction algorithem with applications to document retrieval & categorization[R]. Technical Report TR-00-016,Department of Computer Science,University of Minnesota,Minneapolis,2000.

共引文献41

同被引文献16

引证文献3

二级引证文献12

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部