期刊文献+

一种基于扩展互信息算法的特征选择方法 被引量:2

An Feature Selection Method Based On Extended Mutual Information
下载PDF
导出
摘要 在文本分类系统中,特征选择方法是一种有效的降维方法。在分析了几种常用的特征选择评价函数之后,基于互信息的算法特点,本文提出了一种基于扩展互信息算法的特征选择方法(EMI)。实验结果证明该方法简单可行,有助于提高所选特征子集的有效性。 Feather selecttion is a valid method to reduce the dimension of vector in text categorization system.After analysing some evaluation functions in common used for feature selection,a new feature selection method called extended Mutual Information(EMI) based on the characteristic of mutual information algorithm is presented.The ressult of experiments proved that the method is simple and feasible and is helpful to improve the eficiency of the selected feature subset.
机构地区 河北北方学院
出处 《微计算机信息》 2010年第24期223-224,219,共3页 Control & Automation
基金 河北省张家口市科技局项目(0921045B)
关键词 文本分类 特征选择 评价函数 互信息 text categorization feature selection evaluation function mutual information
  • 相关文献

参考文献6

二级参考文献15

  • 1[1]Harry Zhang,Charles X.Ling.A Fundamental Issue of Naive Bayes,Advances in Artificial Intelligence,AI2003[C],Halifax,Canada,2003(6):591?595.
  • 2[2]Han-joon Kim,Jae-young Chang.Improving Naive Bayes Text Classifier with Modified EM Algorithm[C].ISMIS 2003:326-333.
  • 3[6]Salton G,McGill M.J.Introduction to Modern Information Retrieval[M].NewYork,McGraw-Hill,1983.
  • 4Yang Y,http://citeseernjneccom/yang97comparativehtml,1997年
  • 5Fuhr N,Buchley C.A probabilistic learning approach for document indexing[J].ACM Transactions on Information Systems,1991,9(3):223-248.
  • 6Dumais S T,Platt J,Heckerman D,et al.Inductive learning algorithms and representations for text categorization.In Proceedings of the seventh international conference and konwledge management,1998:148-155.
  • 7Joachims T.A probabilitic analysis of the rocchio algorithm with TFIDF for text categorization[A].In:Proceedings of the 14th International Conference on Machine Learning(ICML-97)[C],1997.
  • 8Mineau G W.A simple KNN algorithm for text categorization[C].International Conference on Data Mining.San Jose,California,USA:IEEE Computer Society,2001.647-648.
  • 9Joachims T.Text categorization with support vector machines:Learning with many relevant features[C].European Conference on Machine Learning.Chemnitz,Germany:Springer,1998.137-142.
  • 10David D L,Ringuette M.A comparison of two learning algorithms for text categorization[C]∥Third Annual Symposium on Document Analysis and Information Retrieval.Las Vegas,NV:ISRI,1994.81-93.

共引文献94

同被引文献15

  • 1边肇祺,张学工.模式识别[M].2版.北京:清华大学出版社,2002.
  • 2国家技术监督局.GB5606.4-2005卷烟感官技术要求[S].北京:中国标准出版社,2005.
  • 3KIRA K, RENDELL L. A practical approach to feature se- lection [C]//Proceedings of the Ninth International Confer- ence on Machine Learning. [S. 1.]: ICML, 1992: 249-256.
  • 4HALL M A. Correlation-based feature subset selection for machine learning[D]. Hamilton, NewZealand: University of Waikato, 1999.
  • 5GUYON I, WESTON J, BARNHILL S, et al. Gene selec- tion for cancer classification using support veetor machines [J]. Machine Learning, 2002, 46(1-3) : 389-422.
  • 6ABE N, KUDO M. Non-parametric classifier-independent feature selection [J]. Pattern Recognition, 2006, 39: 737-746.
  • 7GUYON I, ELISSEEFF A. An introduction to variable and feature selection [J]. Machine Learning Research, 2003 (3) .. 1157-1182.
  • 8肖协忠,王放,贺英,刘红伟,马强,徐海涛.烤烟致香成分与香气质量的相关性分析[J].中国烟草科学,2008,29(6):1-6. 被引量:44
  • 9安欣,徐硕,张录达,苏时光.多因变量LS-SVM回归算法及其在近红外光谱定量分析中的应用[J].光谱学与光谱分析,2009,29(1):127-130. 被引量:10
  • 10朱颢东,钟勇.基于改进的ID3信息增益的特征选择方法[J].计算机工程,2010,36(8):37-39. 被引量:8

引证文献2

二级引证文献14

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部