期刊文献+

一种基于文本分类的特征选择方法 被引量:2

A Feature Selection Method Based On Text Classify
下载PDF
导出
摘要 文本分类中通常采用向量空间模型(VSM)来表示文本特征,如何选择最能够表达文本主题的特征词,从而减少特征空间维数,降低时空复杂度,是一个十分重要的问题。针对此问题本文提出了采用截集模糊C-均值(S2FCM)聚类进行类间特征降维,该方法以最大隶属度原则为指导,在保持模糊聚类的同时,提高收敛速度,并且能够提高特征选择的正确性。同时在算法中使用改进的隶属度、聚类中心计算方法并使用非随机方法确定初始聚类中心。最后实验表明采用该方法选择的文本特征项进行文本分类能够收到比较好的分类结果。 Vector Space Model is often used to denote text feature in text classify. It is an important problem how to choice the feature words which can express the topic exactly, and consequently reduce space dimension and time complexity. For this, we put forward a method using Sectional Set Fuzzy C -means(S2FCM) clustering meth- od to reduce feature dimension. This method guides with the most subjection. On one hand it keeps fuzzy clustering effect, and on the other hand it can enhance the constringency pace and improve the correctness of feature selection. Here we also apply the ameliorated subjection degree and clustering center calculation and the no random method search to determine the initial cluster centers. At the end, the experiment testify by this method can receive good classifying result.
作者 白似雪 陆萍
出处 《南昌大学学报(工科版)》 CAS 2008年第1期87-90,共4页 Journal of Nanchang University(Engineering & Technology)
基金 江西省教育厅计划资助项目(2006[36])
关键词 截集 特征词 VSM 模糊聚类 sectional set feature words VSM fuzzy clustering
  • 相关文献

参考文献8

二级参考文献25

  • 1刘小芳,曾黄麟,吕炳朝.点密度函数加权模糊C-均值算法的聚类分析[J].计算机工程与应用,2004,40(24):64-65. 被引量:28
  • 2胡佳妮,徐蔚然,郭军,邓伟洪.中文文本分类中的特征选择算法研究[J].光通信研究,2005(3):44-46. 被引量:47
  • 3黄昌宁 等.对自动分词的反思[A]..语言计算与基于内容的文本处理[C].北京:清华大学出版社,2003,7.26-38.
  • 4[1]Wu K L,Yang M S.A alternative fuzzy c-means clustering algorithm,Pattern Recognition,2002 ;35:2267-2278
  • 5[2]Zhang D Q,Chen S C.A comment on ‘ Alternative c-means clustering algorithms'.Pattern Recognition,2004 ;37:173-174
  • 6Yang Yiming,ProceedingsoftheSeventeenthInternationalACMSIGIRConferenceonResearchandDevelopme,1994年,12页
  • 7John G H,Kohavi R,Pfleger K.Irrelevant Features and the Subset Selection Problem.In:Proc.of the Eleventh Intl.Conf.on Machine Learning,1994.121~129
  • 8Kohavi R,John G H.Wrappers for feature subset selection.Artificial Intelligence,1997,97 (1-2):273~324
  • 9Liu Huan,Yu Lei.Toward Integrating Feature Selection Algorithms for Classification and Clustering.IEEE Transactions on Knowledge and Data Engineering,2005,17(5):491~502
  • 10Yang J,Honavar V.Feature subset selection using a genetic algorithm.IEEE Intelligent Systems,1998,13(2):44~49

共引文献416

同被引文献12

引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部