期刊文献+

基于同义词词林的文本特征选择方法 被引量:5

A Text Feature Selection Method Based on TongYiCi CiLin
下载PDF
导出
摘要 特性选择是文本分类、机器学习以及模式识别领域的重要问题之一.特征选择能在保证数据完整性的情况下减少高维数据的特征维数,同时提高分类的精度.以往提出的基于同义词词林的特征选择方法虽然能有效避免提取出的特征值在概念上的重复性,但并未考虑到权值最优的特征向量构成的子集可能并非是最优的.为了解决此问题,结合同义词和遗传算法,提出了一种新的基于同义词词林的文本特征选择方法.该方法首先对特征词进行同义词过滤、合并,在降低特征向量维度的同时避免了同义词带来的影响.然后采用改进的遗传算法选出具有较好适应度值的特征向量.实验结果表明,这种方法较之以往提出的方法,在保证特征选择准确率的基础上能明显地减小特征向量的维度. Feature selection is one of important problems in text categorization,machine learning and pattern recognition.In particular,with the rapid development of network and cloud computing,the massive data analysis methods are vitally important.Feature selection can reduce high dimension data′s feature dimension under the condition of ensuring data integrity and classification accuracy.Previously proposed feature selection method based on TongYiCi CiLin can effectively avoid the eigenvalue repetitive in concept,but they did′t consider about that subset composed by the optimal weight of feature vectors may not the best one.To solve this problem,this article combine the TongYiCi and Genetic Algorithm,proposed a text feature selection method based on TongYiCi CiLin.The experiment results show that the method can reduce feature vector′s dimension and improve the efficiency of feature selection.
出处 《厦门大学学报(自然科学版)》 CAS CSCD 北大核心 2012年第2期200-203,共4页 Journal of Xiamen University:Natural Science
基金 国家自然科学基金项目(50604012)
关键词 特征选择 同义词词林 遗传算法 文本分类 feature selection TongYiCi CiLin genetic algorithm text categorization
  • 相关文献

参考文献5

二级参考文献33

共引文献177

同被引文献55

  • 1李长虹,李堂秋.一种改进的特征选择方法在文本分类系统中的应用[J].厦门大学学报(自然科学版),2005,44(B06):239-242. 被引量:3
  • 2卢苇,彭雅.几种常用文本分类算法性能比较与分析[J].湖南大学学报(自然科学版),2007,34(6):67-69. 被引量:31
  • 3Thomas K Landauer,Peter W Foltz,Darrell Laham.An Introduction to Latent Semantic Analysis[J].Discourse Processes,1998(25):259-284.
  • 4Mark Steyvers.Probabilistic Topic Models[D].Uniwersity of California,2005.
  • 5Salton G,Wong A,Yang C S.A vector space model for automatic indexing[J].Commun.ACM,November,1975,18(2):613-620.
  • 6David M Blei,Jon D Mc Auliffe.Supervised topic models[C]//NIPS,2007.
  • 7Samuel Brody,Noemie Elhadad.An unsupervised aspect-sentiment model for online reviews[C]//Human Language Technologies:The 2010Annual Conference of the North American Chapter of the Association for Computational Linguistics,Stroudsburg,PA,USA,2010,HLT’10,2010:804-812.
  • 8Ivan Titov,Ryan Mc Donald.A joint model of text and aspect ratings for sentiment summarization[C].Columbus,Ohio,June 2008,In Proceedings of ACL-08:HLT,2008:308-316.
  • 9Branavan S R K,Chen H,Eisenstein J,et al.Learning document-level semantic properties from free-text annotations[J].Journal of Artificial Intelligence Research,2009,34(1):569-603.
  • 10Hanna Wallach,David Mimno,Andrew Mc Callum.Rethinking lda:Why priors matter[J].Advances in Neural Information Processing Systems22,2009:1973-1981.

引证文献5

二级引证文献18

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部