摘要
聚类分析是数据挖掘领域广泛使用的一种技术,可以自动发现隐含在数据集中的分类模式。学分制体系下的学生选课数据可以看作分类属性的数据或布尔型数据。研究表明,使用距离作为测度的传统聚类算法并不适合处理这类数据。在分析数据集特点的基础上,提出了一个新的聚类算法,它用公共近邻点数来衡量两数据间的相似性,这样有利于考虑数据分布的全局特征,具有良好的聚类特性和可扩展性。通过在开发的模型系统上进行实验分析,得到了较好的实验结果,对较好解决学分制体系下学生专业自动分类问题具有积极意义。
Clustering is a widely used technique for discovering categorical patterns in underlying data in data miming. There exist quite a lot of course-selecting data in credit system,which can be viewed as data with categorical attributes or boolean data. Researches show it is inappropriate to process these datawith a traditional clustering algorithm in which distance is used as a measure. Based on the characteristics analysis of data sets, the author puts forwards a new algorithm in which data similarity is measured with common adjacent points, thus global characteristics of data distribution are taken into account, good clustering property and expandability are achieved, and credit-system-based automatic speciality classification is realized. Experimental analysis is carried out on a developed system modelwith good experimental results.
出处
《计算机应用与软件》
CSCD
北大核心
2007年第5期60-62,共3页
Computer Applications and Software
基金
安徽省教育厅自然科学基金项目(2005KJ051)。
关键词
聚类分析
相异度
学分制
分类属性
Clustering analyzing Dissimilarity Credit system Categorical attributes