基于改进K-medoids算法的科技文献特征选择方法被引量：1

Feature selection method of scientific literatures based on optimized K-medoids algorithm

下载PDF

导出

摘要根据科技文献的结构特点搭建了一个四层挖掘模式,并结合K-medoids算法提出了一个特征选择方法.该选择方法首先依据科技文献的结构将其分为4个层次,然后通过K-medoids算法聚类对前3层逐层实现特征词提取,紧接着再使用Aprori算法找出第4层的最大频繁项集,并作为第4层的特征词集合.同时,由于K-medoids算法的精度受初始中心点影响较大,为了改善该算法在特征选择中的效果,论文又对K-medoids算法的初始中心点选择进行优化.实验结果表明,结合优化K-medoids的四层挖掘模式在科技文献分类方面有较高的准确率. According to the structural characteristics of the scientific literature, the paper set up a four-level mining mode, and combined K-medoids algorithm to propose a feature selection method of scientific literatures. The proposed feature selection method firstly divided scientific literature into four layers according to its structure, and then selected features progressively for the former three layers by K-medoids algorithm, finally found out the maximum frequent itemsets of fourth layer by Aprori algorithm to act as a collection of Features fourth layer. Meanwhile, because the clustering accuracy of Kmedoids algorithm is influenced by the initial centers, in order to improve the effect of feature selection, the paper also optimized K-medoids algorithm which it firstly used information entropy empower the clustering objects to correct the distance function, and then employed empowerment function value to select the optimal initial clustering cen ter. Experimental results show that the four-level mining mode combined optimized K medoids has higher accuracy in scientific literature classification.

作者李俊州武莹

机构地区开封大学艺术设计学院开封大学软件职业技术学院

出处《华中师范大学学报（自然科学版）》 CAS 北大核心 2015年第4期541-545,共5页 Journal of Central China Normal University：Natural Sciences

关键词文本分类特征选择 K-medoids算法 text classification feature selection K-medoids algorithm

分类号 TP392 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献12

1刘海燕,王超,牛军钰.基于条件互信息的特征选择改进算法[J].计算机工程,2012,38(14):135-137. 被引量：9
2潘果.基于正则化互信息改进输入特征选择的分类算法[J].计算机工程与应用,2014,50(15):25-29. 被引量：3
3刘海峰,苏展,刘守生.一种基于词频信息的改进CHI文本特征选择[J].计算机工程与应用,2013,49(22):110-114. 被引量：24
4Muge Elif Orakoglu,Cevdet Emin Ekinci.Optimization of constitutive parameters of foundation soils k-means clustering analysis[J].Research in Cold and Arid Regions,2013,5(5):626-636. 被引量：7
5Dernoncourt D. Analysis of feature selection stability on high dimension and small sample data[J]. Computational Statis tics and Data Analysis, 2014, 71(3):681-693.
6SinaT, Parham M, Fardin A. An unsupervised feature selec tion algorithm based on ant colony optimization[J]. Engineer ing Applications of Artificial intelligence, 2014, 32(6): 112-123.
7Salwani A. An exponential Monte-Carlo algorithm for lea ture selection problems[J]. Computers and Industrial Engi neering, 2014, 67(1): 160-167.
8Wu X. Online feature selection with streaming features[J].IEEE Transactions on Pattern Analysis and Machine Intelli- gence, 2013, 35(5): 1178-1192.
9Han J, Kamber M. Date Mining: Comcepts and Techniques [M].北京:机械工程出版社,2001.
10朱颢东,吴怀广.基于论域划分的无监督文本特征选择方法[J].科学技术与工程,2013,21(7):1836-1839. 被引量：2

二级参考文献103

1杨打生,郭延芬.一种特征选择的信息论算法[J].内蒙古大学学报（自然科学版）,2005,36(3):341-345. 被引量：1
2赵万磊,王永吉,张学杰,李娟.一种优化初始中心点的K平均文本聚类算法[J].计算机应用,2005,25(9):2037-2040. 被引量：6
3苏金树,张博锋,徐昕.基于机器学习的文本分类技术研究进展[J].软件学报,2006,17(9):1848-1859. 被引量：376
4陆林花,王波.一种改进的遗传聚类算法[J].计算机工程与应用,2007,43(21):170-172. 被引量：26
5McQUEEN J. Some methods for classification and analysis of multivariate observations[ C]//Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability. Berkeley: University of California Press, 1967:281 -297.
6AISABTI K, RANKA S, SINGH V. An efficient K-means clustering algorithm[ C]// IPPS/SPDP Workshop on High Performance Data Mining. Orlando, Florida: [s. n.], 1998:9 - 15.
7ESTER M, KRIEGEL H P, SANDER J, et al. A density-based algorithm for discovering clusters in large spatial databases with noise [ C]// Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining. Portland: AAAI, 1996:226 - 231.
8David aha and fellow graduate students at UC irvine [ EB/OL]. [ 2010 -06 -01 ]. http://archive, ics. uci. edu/ml/datasets. html.
9Makrehchi M,Kamel M S. Text classification using small number of features[C]//Perner P, Imiya A, eds. Proc. of the 4th Int'l Conf. on Machine Learning and Data Mining in Pattern Recognition: (MLDM 2005). 2005 : 580-589.
10MacQueen J. Some methods for classification and analysis of multivariate observations[G]// Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability. Berkeley: University of California Press, 1967 : 281-297.

共引文献115

1宋军英,崔益伟,李欣然,钟伟,邹鑫,李培强.基于欧氏动态时间弯曲距离与熵权法的负荷曲线聚类方法[J].电力系统自动化,2020(15):87-98. 被引量：29
2苏志刚,韩佩佩,吴仁彪.基于数据挖掘的快速记录存储器数据处理技术[J].信息与电子工程,2012,10(1):118-123. 被引量：2
3王培崇,钱旭,雷凤君.新的混合小生境鱼群聚类算法[J].计算机应用,2012,32(8):2189-2192. 被引量：7
4皮国强,杜朝东.改进的k-均值算法在大学生科技创业活动研究中的应用[J].软件导刊,2012,11(9):38-39.
5叶安新,邓大勇.基于改进量子遗传算法的聚类算法[J].计算机仿真,2013,30(4):275-278. 被引量：5
6张靖,段富.优化初始聚类中心的改进k-means算法[J].计算机工程与设计,2013,34(5):1691-1694. 被引量：56
7李向,刘素红.一种基于离群指数的初始聚类中心优选算法[J].微电子学与计算机,2013,30(6):109-112.
8樊晓光,路钊,王久崇,李国栋,谢朝政.基于密度和距离积的聚类中心选取方法[J].测控技术,2013,32(10):152-154. 被引量：5
9万燕,徐勤燕,黄蒙蒙.复杂背景中基于纹理和颜色的车牌定位研究[J].计算机应用与软件,2013,30(10):259-262. 被引量：16
10曹永春,蔡正琦,邵亚斌.基于K-means的改进人工蜂群聚类算法[J].计算机应用,2014,34(1):204-207. 被引量：41

同被引文献26

1王凯,孙济庆,李楠.面向学术文献的知识挖掘方法研究[J].现代情报,2017,37(5):47-51. 被引量：7
2赵丹宁,牟冬梅,斯琴.研究型科技文献的实验数据自动抽取研究--以药物代谢动力学文献为例[J].图书馆建设,2017(12):33-38. 被引量：3
3范馨月,崔雷.基于文本挖掘的药物副作用知识发现研究[J].数据分析与知识发现,2018,2(3):79-86. 被引量：8
4贾军,魏洁云.新兴产业核心技术早期识别方法与应用研究[J].科学学研究,2018,36(7):1206-1214. 被引量：23
5贾丽燕,来保勇,赵楠琦,王晓,谭展飞,刘兆兰,刘建平.基于文献数据挖掘的糖尿病视网膜病变中药用药关联规则分析[J].中国中医眼科杂志,2019,29(1):25-30. 被引量：18
6宫小翠,安新颖,单连慧.基于Labeled LDA主题模型的医学文献自动分类法[J].中华医学图书情报杂志,2018,27(10):53-58. 被引量：2
7李玉,王利,周志平,赵卫东.基于DBSCAN聚类改进随机森林算法的专利价值评估方法[J].科学技术与工程,2020,20(14):5673-5679. 被引量：11
8顾海,奉子岚,吴迪,杨妮超.我国远程医疗研究现状及趋势——基于CiteSpace的文献量化分析[J].信息资源管理学报,2020,10(4):119-128. 被引量：30
9黄鲁成,李晓宇,李晋.基于专利的ABOD-RFM技术机会识别方法研究[J].情报理论与实践,2020,43(9):144-149. 被引量：16
10马建红,曹文斌,刘元刚,夏爽.基于功效特征的专利聚类方法[J].计算机应用,2021,41(5):1361-1366. 被引量：1

引证文献1

1孙盟盟,奚洋洋.面向科技文献的国内外知识挖掘研究热点与展望——基于WOS核心合集与CNKI数据库的计量分析[J].河北科技图苑,2023,36(5):66-75.

1王雪梅,李晓峰,高巍巍.一种改进的K-Means聚类算法的研究[J].计算机与数字工程,2013,41(11):1717-1719. 被引量：6
2陈亚峰.一种新的K-均值动态聚类算法[J].济源职业技术学院学报,2014,13(4):4-7.
3于丽.一种改进的K-means聚类算法[J].辽宁师专学报（自然科学版）,2010,12(2):1-1. 被引量：3
4庞宁.基于网页特征的特征词提取技术[J].西南民族大学学报（自然科学版）,2014,40(1):137-141.
5李江华,杨书新,刘利峰.基于概念格的文本聚类[J].计算机应用,2008,28(9):2328-2330. 被引量：3
6邹娟,周经野,邓成,高南莎.特征词提取中同义处理的新方法[J].中文信息学报,2005,19(6):44-49. 被引量：10
7邓丹君,姚莉.基于改进TF-IDF的微博短文本特征词提取算法[J].软件导刊,2016,15(6):48-50. 被引量：7
8刘圣军,刘新儒.快速Hermite径向基函数曲面重构[J].中国科学：信息科学,2014,44(11):1409-1421. 被引量：6
9李萍.基于最小时延差的多播路由中心点选择算法[J].硅谷,2009,2(18).
10林岚岚.基于语法模式的评论特征词提取[J].广东水利电力职业技术学院学报,2014,12(4):24-26.

华中师范大学学报（自然科学版）

2015年第4期

浏览历史

内容加载中请稍等...

基于改进K-medoids算法的科技文献特征选择方法被引量：1

参考文献12

二级参考文献103

共引文献115

同被引文献26

引证文献1

相关作者

相关机构

相关主题

浏览历史

基于改进K-medoids算法的科技文献特征选择方法 被引量：1

参考文献12

二级参考文献103

共引文献115

同被引文献26

引证文献1

相关作者

相关机构

相关主题

浏览历史

基于改进K-medoids算法的科技文献特征选择方法被引量：1