期刊文献+

基于关联特征扩展的特征选择算法

Feature Selection Algorithm Based on Association Features Enhancement
下载PDF
导出
摘要 特征选择是文档分类中常见的预处理工作,通过对文档特征空间降维,可以提高文档的分类性能。针对多数特征选择算法不考虑特征词共现关系的问题,该文提出了一种利用关联特征来增强文档分类性能的方法,针对特征扩展后产生的高维向量空间设计了一种快速冗余特征去除和选择算法,以满足实际应用中对增强特征分类性能和执行效率的需要。实验采用朴素贝叶斯网作为分类器,从特征降维效果、分类性能以及算法执行效率等方面与其他算法进行了比较。 Feature selection is frequently used as a preprocessing step to text classification, which is effective in reducing dimensionality and increasing classification accuracy. However, most feature selection algorithms fail to take advantage of the co-occurrence of words. This paper explores the use of association features to enhance the performance of primitive features and proposes a new fast algorithm for identifying relevant features as well as redundancy among high dimensional features. The experiment are conducted with Naive Bayes, it compares the method with other feature selection algorithms with respect to the feature numbers, accuracy and effectiveness.
出处 《计算机工程》 CAS CSCD 北大核心 2007年第16期150-152,共3页 Computer Engineering
基金 国家自然科学基金资助项目(604003009) 重庆市自然科学基金资助项目(2005BB2224)
关键词 文档分类 特征选择 关联特征 text classification feature selection association feature
  • 相关文献

参考文献7

  • 1Almuallim H,Dietterich T G.Learning Boolean Concepts in the Presence of Many Irrelevant Features[J].Artificial Intelligence,1994,69(1/2):279-305.
  • 2Hall M A.Correlation-based Feature Selection for Machine Learning[D].Hamilton,New Zealand:Department of Computer Science,University of Waikato,1998.
  • 3Mitra P,Murthy C A,Pal S K.Unsupervised Feature Selection Using Feature Similarity[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2002,24(3):301-312.
  • 4Tan Chade-meng,Wang Yuan-fang,Lee Chan-do.The Use of Bigrams to Enhance Text Categorization[J].Information Processing and Management,2002,38(4):529-546.
  • 5Zaiane O R,Antonie M L.Classifying Text Documents by Associating Terms with Text Categories[C]//Proc.of the 13th Australasian Database Conference.2002:215-222.
  • 6Rushing J A.Using Association Rules As Texture Features[J].IEEE Trans.on Pattern Analysis and Machine Intelligence,2001,23(8).
  • 7Koller D,Sahami M.Hierarchically Classifying Documents Using Very Few Words[C]//Proc.of International Conference on Machine Learning.1997:170-178.

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部