摘要
核k近邻分类算法在生物信息学和蛋白质结构预测等领域中的应用受到人们极大的关注。核函数在核k近邻分类算法的分类性能中起着重要的作用,如果核函数及其参数选择得当,则将获得较高的分类准确率。为了自动产生合适的核函数,提高分类的准确率,提出了一种基于基因表达式编程的核k近邻分类算法GEPKNN。该算法的基本思想是用基因表达式编程搜索与训练数据相关的核函数及其参数,在进化过程中用k折交叉验证评估个体的适应度。该算法克服了核k近邻算法的主观性和不确定性,能自动产生合适的核函数并提高分类的准确率。
The kernel KNN classifier becomes an attractive and interest topic in application of bioinforrmatics and protein structure prediction. Performence of the kernel KNN is strongly dependent on the kernel function. A better classification performance could be achieved by choosing the kernel function and its parameters carefully. Describes a kernel KNN classifier based on gene expression programming (GEPKNN) , which adopts gene expression programming to search for any kernel function that is related to the training data. K cross-validation is used to assess the fimess values of the individuals in the current population. The method can automatically construct a proper kernel function and overcome the subjectivity and uncertainty of kernel KNN classifier, and the accuracy can be also raised.
出处
《计算机技术与发展》
2009年第8期19-22,共4页
Computer Technology and Development
基金
贵阳市科技攻关项目(2006
16-6号)
关键词
数据挖掘
进化计算
基因表达式编程
核k近邻分类器
data mining
evolution computation
gene expression programming
kernel KNN classifier