期刊文献+

基于粗糙集的中文文本特征选择方法研究 被引量:3

STUDY ON FEATURE SELECTION OF CHINESE TEXT BASED ON ROUGH SET
下载PDF
导出
摘要 针对传统的特征选择使用阈值过滤导致有效信息丢失的问题,提出一种粗糙集的文本特征选择方法。该方法以核为起点利用特征属性的重要性和依赖性作为启发式信息进行特征选择,使文本的特征维数得到一定程度的降低。实验表明,此算法不仅易于实现而且能够有效降低特征数目,提高分类效率。 Aiming at the problem that in traditional feature selection the use of threshold filtering often leads to the loss of effective information, a new algorithm based on rough set is proposed for text feature selection. The algorithm takes core as the begging, uses attributes' significance and dependency as the heuristic information to do feature selection,which greatly reduces the dimension of document's eigenvector. Experimental results show that the algorithm is easy to implement and can effectively reduce the features' number, as well as improve the accuracy of classification.
出处 《计算机应用与软件》 CSCD 2010年第3期4-5,74,共3页 Computer Applications and Software
基金 国家自然科学基金项目(60573179)
关键词 粗糙集 特征选择 属性重要性 属性依赖性 Rough set Feature selection Attribute significance Attribute dependency
  • 相关文献

参考文献6

  • 1寇苏玲,蔡庆生.中文文本分类中的特征选择研究[J].计算机仿真,2007,24(3):289-291. 被引量:30
  • 2周茜,赵明生,扈旻.中文文本分类中的特征选择研究[J].中文信息学报,2004,18(3):17-23. 被引量:165
  • 3Sahon G, Wong A, Yang C. A vector space model for automatic indexing [ J ]. Communications of the ACM, 1975,18 ( 11 ) :613 - 620.
  • 4曾黄麟.智能计算[M].重庆:重庆大学出版社,2004..
  • 5http ://www. sogou, com/labs/dl/c, html.
  • 6Yang Yiming, Liu Xin. A re-examination of text categorization methods [ C ]//Proceedings of ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'99) ,Berkeley,Cal.

二级参考文献15

  • 1Yang Yiming,Pederson J O.A Comparative Study on Feature Selection in Text Categorization [A].Proceedings of the 14th International Conference on Machine learning[C].Nashville:Morgan Kaufmann,1997:412-420.
  • 2Y.Yang.Noise reduction in a statistical approach to text categorization[A].Proceedings of the 18th Ann Int ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR95)[C].Seattle:ACM Press,1995:256-263.
  • 3Thorsten Joachims,Text Categorization with Support Vector Machines:Learning with Many Relevant Features[A],In:European Conferrence on Machine Learning (ECML)[C].Berlin:Springer,1998,137-142.
  • 4Mlademnic,D.,Grobelnik,M.Feature Selection for unbalanced class distribution and Nave Bayees[A].Proceedings of the Sixteenth International Conference on Machine Learning[C].Bled:Morgan Kaufmann,1999:258-267.
  • 5梁久祯 兰东俊 扈旻.基于先验知识的网页特征压缩与线性分类器设计[A]..第十二届全国神经计算学术大会论文集[C].北京:人民邮电出版社,2002.494-501.
  • 6Yiming Yang.A Comparative Study on Feature Selection in Text Categorization[J].The ICML97,Nashville,1997.
  • 7Monica Rogati,Y Yang.High-Performing Feature Selection for Text categorization[C].Proceedings of the Fourteenth International Conference on Machine Learning (ICML'99),2000.
  • 8Thorsten Joachims.Text Classification with Support Vector Machines:Learning with Many Relevant Feature[J].Artificial Intelligence Journal special issue:Best of IJCAI-99,2000.
  • 9T Joachims.Making large-Scale SVM Learning Practical.Advances in Kernel Methods-Support Vector Learning[M].B Sch?lkopf and C Burges and A Smola (ed.),MIT-Press,1999.
  • 10王梦云,曹素青.基于字频向量的中文文本自动分类系统[J].情报学报,2000,19(6):644-649. 被引量:17

共引文献277

同被引文献46

  • 1邵敬敏,周芍.语义特征的界定与提取方法[J].外语教学与研究,2005,37(1):21-28. 被引量:52
  • 2吴力群.知识基因、知识进化与知识服务[J].现代情报,2005,25(6):177-179. 被引量:9
  • 3曹付元,梁吉业,钱宇华.基于信息熵的决策表约简[J].计算机应用,2005,25(11):2630-2631. 被引量:6
  • 4尚文倩,黄厚宽,刘玉玲,林永民,瞿有利,董红斌.文本分类中基于基尼指数的特征选择算法研究[J].计算机研究与发展,2006,43(10):1688-1694. 被引量:38
  • 5Weal, Mark J., Michaelides, Danius T., Page, Kevin R., De Roure,David C., Monger, Eloise and Gobbi, Mary. Semantic annotation ofubiquitous learning environments[J]. IEEE Transactions on LearningTechnologies, 2012,5 (2): 143-156.
  • 6Ting-Peng Liang, Yung-Fang Yang, Deng-Neng Chen, & Yi-ChengKu. A semantic-expansion approach to personalized knowledgerenommendation Original Research Article[J]. Decision SupportSystems, 2008, (3): 401-412.
  • 7Maged N. Kamel Boulos. Semantic Wikis: A ComprehensibleIntroduction with Examples from the Health SciencesfJ]. Journal ofEmerging Technologies in Web Intelligence,2009, (1): 94-96.
  • 8Jesus Soto Carrion, Elisa Garcia Gordo, & Salvador Sanchez-Alonso.Semantic learning object repositories[J]. International Journal ofContinuing Engineering Education and Life Long Learning, 2007, (17):432-446.
  • 9Hyun-seok Minjae Young Choi,Wesley De Neve, &Yong Man Ro.Bimodal fusion of low-level visual features and high-level semanticfeatures for near-duplicate video clip dfttection[J]. Signal Processing:Image Communication, 2011, 26(10): 612 - 627.
  • 10Yin-Hsi Kuo, Wen-Huang Cheng, Member, IEEE, Hsuan-Tien Lin,Memi.er, IEEE, and Winston H. Hsu. Unsupervised Semantic FeatureDiscovery for Image Object Retrieval and Tag Refinement[J]. IEEETransactions on Multimedia, 2012, 14⑷:1079-1090.

引证文献3

二级引证文献21

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部