期刊文献+

基于概念簇的文本向量构建方法 被引量:2

Method of text vector construction based on concept cluster
下载PDF
导出
摘要 为提高文本向量对文本概念的逼近程度,通过将具有相同语法语义特征的词进行聚类,提取概念簇,利用空间变换将文本向量由词空间变换到概念簇空间上来表达文本。实验比较了基于TF-IDF、IG、TF-IDF-IG、LSA以及它们结合概念簇后对文本分类的效果,证明了基于概念簇的文本向量构建方法能提高文本向量对文本概念逼近的准确程度,同时也提高了不同类型文本之间的区分度。 To enhance the performance of the text vector,terms were clustered,which contained similar syntax or seman-tic feature,to construct concept cluster.The text vector would be transformed from term-space to concept-cluster-space to represent the original text.The experiment compared effects of text classification based on TF-IDF,IG,TF-IDF-IG,LSA,and their combinations with concept cluster.And the results show that,the text vector based on concept cluster improves the accuracy of text concept approaching,and advances the discriminating degree between different types of texts.
出处 《通信学报》 EI CSCD 北大核心 2010年第S1期44-47,共4页 Journal on Communications
基金 国家242计划基金资助项目(2005C48) 北京理工大学基础研究基金资助项目(20060142014) 北京理工大学研究生科技创新基金资助项目(GC200802)~~
关键词 中文信息处理 文本向量 概念簇 文本分类 chinese information processing text vector concept cluster text classification
  • 相关文献

参考文献7

二级参考文献37

  • 1吴军,王作英,禹锋,王侠.汉语语料的自动分类[J].中文信息学报,1995,9(4):25-32. 被引量:24
  • 2[1]Chien Chin Chen, Meng Chang Chen,Yeali Sun. PVA: A Self-Adaptive Personal View Agent [J]. Journal of Intelligent Information Systems, 18:2/3, 173-194, 2002.
  • 3[2]Anandeep S. Pannu and Katia Sycara[J]. Learning Text Filtering Preferences.
  • 4[3]C. Burckley, A. Singhal, and M. Mitra. New retrieval approaches using SMART[C]. In: D. K, Harmann, editor, Proceedings of the Fourth Text Retrieval Conference (TREC-4), Gaithersburg,1996.
  • 5[4]S.E.Roberson and S.Walker,Okapi/ Keenbow at TREC8[C]. In: E.M. Voorhees and D.K.Harmann, editor,Proceedings of the Eighth Text Retrieval Conference(TREC-8),Gaithershurg,2000.
  • 6[5]Kjersti Aas and Line Eikvil. Text Categorization : A Survey,1999 [Z].
  • 7[6]Rong Jin , Christos Faloutsos and Alex G. Hauptmann Meta-scoring: Automatically Evaluating Term Weighting Schemes in IR without Precision -Recall [C]. In: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, pages 83-89. ACM Press, 2001.
  • 8[2]Sebastiani F.Machine learning in automated text categorization[J].ACM Computing Surveys,2002,34(1):1 -47.
  • 9[3]Lewis D D,Na(i)ve Bayes.The independence assumption in information retrieval[C]// The 10th European Conf on Machine Learning.New York:Springer-Verlag,1998.
  • 10[4]Yiming Yang,Xin Liu.A re-examination of text categorization methods[C]// SIGIR' 99.New York:ACM Press,1999:42-49.

共引文献273

同被引文献19

引证文献2

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部