摘要
k-近邻(KNN)算法具有直观、无需先验统计知识、无监督学习等优点。多维度数据存在边界模糊性,这导致集合元素隶属关系的不确定,传统KNN算法不能有效地进行分类。本文提出利用模糊测度加强不确定性特征信息的量化,建立基于模糊测度的k近邻分类算法(FM-KNN)。先通过构建证据理论(Dempster-Shafer Theory)模糊测度函数,解决证据理论非单调性等问题;再利用证据模糊测度对多维度属性的不确定信息进行量化计算,通过支持信度确定样本分类规则。通过对比实验表明,在多维度样本数据分类方面FM-KNN算法比其他KNN分类算法有着更好的效果。
k-nearest neighbor (KNN) algorithm has many advantages such as intuitiveness, requiring no prior knowledge of statistics, unsupervised learning, etc, but it cannot deal effectively with the multi-dimension data sample which uncertainty of subordinate relationship due to the fuzziness of boundary element set. This paper presents a fuzzy measures k-Nearest Neighbor (FM-KNN), which applies fuzzy measures to strengthen the quantitative uncertainty characteristic information. The main idea is stated as follows: firstly we use fuzzy measure to solve non-monotonic of Dempster-Shafer Evidence Theory; then we quantify the uncertainty calculation about multi-dimensional attribute information by using of new Dempster-Shafer fuzzy measure function; finally we determine FM-KNN classification rules by a sample of support reliability. The results show that FM-KNN is better than other KNN in the multi-dimensional data classification.
出处
《系统工程》
CSSCI
CSCD
北大核心
2010年第3期103-107,共5页
Systems Engineering
基金
国家自然科学基金资助项目(70801021)
教育部人文社会科学资助项目(08JC630019)
关键词
多维度数据
模糊测度
K近邻
证据理论
Multi-dimensions Data
Fuzzy Measure
k-Nearest Neighbor
Dempster-Shafer Evidence Theory