摘要
为提高文本向量对文本概念的逼近程度,通过将具有相同语法语义特征的词进行聚类,提取概念簇,利用空间变换将文本向量由词空间变换到概念簇空间上来表达文本。实验比较了基于TF-IDF、IG、TF-IDF-IG、LSA以及它们结合概念簇后对文本分类的效果,证明了基于概念簇的文本向量构建方法能提高文本向量对文本概念逼近的准确程度,同时也提高了不同类型文本之间的区分度。
To enhance the performance of the text vector,terms were clustered,which contained similar syntax or seman-tic feature,to construct concept cluster.The text vector would be transformed from term-space to concept-cluster-space to represent the original text.The experiment compared effects of text classification based on TF-IDF,IG,TF-IDF-IG,LSA,and their combinations with concept cluster.And the results show that,the text vector based on concept cluster improves the accuracy of text concept approaching,and advances the discriminating degree between different types of texts.
出处
《通信学报》
EI
CSCD
北大核心
2010年第S1期44-47,共4页
Journal on Communications
基金
国家242计划基金资助项目(2005C48)
北京理工大学基础研究基金资助项目(20060142014)
北京理工大学研究生科技创新基金资助项目(GC200802)~~
关键词
中文信息处理
文本向量
概念簇
文本分类
chinese information processing
text vector
concept cluster
text classification