期刊文献+

基于改进K-medoids算法的科技文献特征选择方法 被引量:1

Feature selection method of scientific literatures based on optimized K-medoids algorithm
下载PDF
导出
摘要 根据科技文献的结构特点搭建了一个四层挖掘模式,并结合K-medoids算法提出了一个特征选择方法.该选择方法首先依据科技文献的结构将其分为4个层次,然后通过K-medoids算法聚类对前3层逐层实现特征词提取,紧接着再使用Aprori算法找出第4层的最大频繁项集,并作为第4层的特征词集合.同时,由于K-medoids算法的精度受初始中心点影响较大,为了改善该算法在特征选择中的效果,论文又对K-medoids算法的初始中心点选择进行优化.实验结果表明,结合优化K-medoids的四层挖掘模式在科技文献分类方面有较高的准确率. According to the structural characteristics of the scientific literature, the paper set up a four-level mining mode, and combined K-medoids algorithm to propose a feature selection method of scientific literatures. The proposed feature selection method firstly divided scientific literature into four layers according to its structure, and then selected features progressively for the former three layers by K-medoids algorithm, finally found out the maximum frequent itemsets of fourth layer by Aprori algorithm to act as a collection of Features fourth layer. Meanwhile, because the clustering accuracy of Kmedoids algorithm is influenced by the initial centers, in order to improve the effect of feature selection, the paper also optimized K-medoids algorithm which it firstly used information entropy empower the clustering objects to correct the distance function, and then employed empowerment function value to select the optimal initial clustering cen ter. Experimental results show that the four-level mining mode combined optimized K medoids has higher accuracy in scientific literature classification.
作者 李俊州 武莹
出处 《华中师范大学学报(自然科学版)》 CAS 北大核心 2015年第4期541-545,共5页 Journal of Central China Normal University:Natural Sciences
关键词 文本分类 特征选择 K-medoids算法 text classification feature selection K-medoids algorithm
  • 相关文献

参考文献12

二级参考文献103

  • 1杨打生,郭延芬.一种特征选择的信息论算法[J].内蒙古大学学报(自然科学版),2005,36(3):341-345. 被引量:1
  • 2赵万磊,王永吉,张学杰,李娟.一种优化初始中心点的K平均文本聚类算法[J].计算机应用,2005,25(9):2037-2040. 被引量:6
  • 3苏金树,张博锋,徐昕.基于机器学习的文本分类技术研究进展[J].软件学报,2006,17(9):1848-1859. 被引量:376
  • 4陆林花,王波.一种改进的遗传聚类算法[J].计算机工程与应用,2007,43(21):170-172. 被引量:26
  • 5McQUEEN J. Some methods for classification and analysis of multivariate observations[ C]//Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability. Berkeley: University of California Press, 1967:281 -297.
  • 6AISABTI K, RANKA S, SINGH V. An efficient K-means clustering algorithm[ C]// IPPS/SPDP Workshop on High Performance Data Mining. Orlando, Florida: [s. n.], 1998:9 - 15.
  • 7ESTER M, KRIEGEL H P, SANDER J, et al. A density-based algorithm for discovering clusters in large spatial databases with noise [ C]// Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining. Portland: AAAI, 1996:226 - 231.
  • 8David aha and fellow graduate students at UC irvine [ EB/OL]. [ 2010 -06 -01 ]. http://archive, ics. uci. edu/ml/datasets. html.
  • 9Makrehchi M,Kamel M S. Text classification using small number of features[C]//Perner P, Imiya A, eds. Proc. of the 4th Int'l Conf. on Machine Learning and Data Mining in Pattern Recognition: (MLDM 2005). 2005 : 580-589.
  • 10MacQueen J. Some methods for classification and analysis of multivariate observations[G]// Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability. Berkeley: University of California Press, 1967 : 281-297.

共引文献115

同被引文献26

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部