摘要
为了进一步提高期刊论文题名信息分类查准率和查全率,提出一种基于改进KPCA与SVM的知网题名信息分类算法。基于中国知网数据库选取《中文核心期刊要目总览》(2014年版)2017年度31种计算机学科(TP)期刊收录的13401篇论文题名作为实验语料库,采用改进KPCA算法对数据进行降维和特征提取,将提取的特征数据库作为SVM的输入进行训练和分类。实验结果表明,该方法较以往分类算法能够进一步提高期刊论文题名的分类效果。
In order to further improve the precision ratio and recall ratio of title information classification of journal pa-pers,a CNKI title information classification algorithm based on improved KPCA and SVM is designed.On the basis of CNKI da-tabase,the titles of 13 401 theses collected by 31 journals of computer science in 2017 are selected as experimental corpus,and then,the improved KPCA algorithm is used to conduct dimension reduction and feature extraction of data.The extracted fea-ture data is used as the input of SVM for training and classifying.The experiment results show that this method can further im-prove the classification effect of titles of journal papers in comparison with the previous classification algorithms.
作者
聂黎生
NIE Lisheng(School of Computer Science and Technology,Jiangsu Normal University,Xuzhou 221116,China)
出处
《现代电子技术》
北大核心
2019年第16期108-111,共4页
Modern Electronics Technique
基金
国家自然科学基金项目(21776119)
江苏省高校自然科学研究项目(16KJB510009)
江苏师范大学科研基金项目(15XLB01)~~
关键词
题名分类
核主成分分析
数据降维
特征提取
数据挖掘
模式识别
title classification
kernel principal component analysis
data dimension reduction
feature extraction
data mining
pattern identification