期刊文献+

基于词向量空间模型的中文文本分类方法 被引量:14

Method of Chinese text categorization based on the word vector space model
下载PDF
导出
摘要 大多文本分类方法是基于向量空间模型的,基于这一模型的文本向量维数较高,导致分类器效率难以提高。针对这一不足,该文提出基于词向量空间模型的文本分类方法。其主要思想是把文本的特征词表示成空间向量,通过训练得到词-类别支持度矩阵,根据待分文本的词和词-类别支持度矩阵计算文本与类别的相似度。实验证明,这一分类方法取得了较高的分类精度和分类效率。 Most of the methods of text categorization are based on the vector space model,but the high dimension of document vectors based on the model leads to difficulty in improving efficiency of the classifier. In view of the defect, a method of Chinese text categorization based on the word vector space model is presented in this paper. The characteristic words of a text are defined as space vectors, and the word-class supporting matrix can be gotten by training, and then the characteristic words and the word-class supporting matrix are used for computing text similarity. Experiment shows that the presented method has higher precision and efficiency.
出处 《合肥工业大学学报(自然科学版)》 CAS CSCD 北大核心 2007年第10期1261-1264,共4页 Journal of Hefei University of Technology:Natural Science
基金 安徽省自然科学基金资助项目(050420207)
关键词 文本分类 向量空间模型 K-最近邻居 词向量空间模型 text categorization vector space model K-nearest neighbor word vector space model
  • 相关文献

参考文献7

  • 1HanJiawei MichelineKambe.数据挖掘概念与技术[M].北京:机械工业出版社,2001..
  • 2Friedman J H. Flexible metric nearest neighbor classification[R]. Department of Statistics and Stanford Linear Accelerator Center, Stanford University, 1994.
  • 3庞剑锋,卜东波,白硕.基于向量空间模型的文本自动分类系统的研究与实现[J].计算机应用研究,2001,18(9):23-26. 被引量:293
  • 4Yang Y, Liu X. A re-examination of text categorization methods [C]//Proceedings of ACM SIGIR Conference on Research and Development in Information Retrieval. Berkeley,2002:42-49.
  • 5Salton G, Wong A, Yang C S. On the specification of term values in automatic indexing[J]. Journal of Documentation, 1973,29(4) :351-372.
  • 6Yang Yiming , Pederson J O. A comparative study on feature selection in text categorization[C]//Proceedings of the 14th International Conference on Machine learning. Nashville: Morgan Kaufmann,1997:412 420.
  • 7周茜,赵明生,扈旻.中文文本分类中的特征选择研究[J].中文信息学报,2004,18(3):17-23. 被引量:165

二级参考文献16

  • 1黄萱青 吴立德.独立于语种的文本分类方法[M].,2000.37-43.
  • 2鲁松 白硕 等.文本中词语权重计算方法的改进[M].,2000.31-36.
  • 3卜东波.聚类/分类理论研究及其在大模型文本挖掘的应用:博士论文[M].,2000..
  • 4黄萱菁,2000 International Conference on Multilingual Information Processing,2000年,37页
  • 5鲁松,2000 International Conference on Multilingual Information Processing,2000年,31页
  • 6卜东波,博士学位论文,2000年
  • 7Yang Yiming,Proceedings of ACMSIGIR Conference on Research and Development in Information Retrieval(SIGIR),1999年,42页
  • 8Yang Yiming,J Information Retrieval,1999年,1卷,1/2期,67页
  • 9Yang Yiming,Pederson J O.A Comparative Study on Feature Selection in Text Categorization [A].Proceedings of the 14th International Conference on Machine learning[C].Nashville:Morgan Kaufmann,1997:412-420.
  • 10Y.Yang.Noise reduction in a statistical approach to text categorization[A].Proceedings of the 18th Ann Int ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR95)[C].Seattle:ACM Press,1995:256-263.

共引文献596

同被引文献102

引证文献14

二级引证文献56

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部