摘要
数据库分类是多数据库存储、管理和挖掘的预处理技术。目前,不依赖具体应用的多数据库分类的研究甚少,并且忽略内聚度和耦合度,复杂度高。论文提出一个基于高内聚和低耦合的多数据库分类方法,该方法不依赖于具体的应用,避免了聚类结果的不稳定性,且降低了时间复杂度。具体地,该方法名为DHC首先构造一个多目标优化问题,然后利用层次聚类思想构造算法查找最优聚类。利用一个人工数据库和一个现实数据库相似度二维表进行实验,实验表明该方法聚类稳定性强,时间复杂度比BestClassification低,泛化能力强。
Database classification is a preprocess technology for multi-database storage,management and mining.At present,there is a few related works for multi-database classification which is application-independent.However,they ignore the inter-class coupling or intra-class cohesion,and have a higher complexity.In this paper,an application-independent database classification method based on high intra-class cohesion and low inter-class coupling is proposed.Which uses hierarchical clustering method to avoid the instability and reduce the time complexity.The methodology,called Database Hierarchical Clustering,is established from a multi-criteria optimization problem of the sum of distance of classes and the coupling of classes and the number of classes,and uses hierarchical clustering method to find the best cluster.Experiments on a synthetic database and the real-world databases show that the proposed methodology indeed is efficient in finding clusters in a set of databases.
出处
《计算机与数字工程》
2016年第7期1226-1229,1342,共5页
Computer & Digital Engineering
关键词
数据库聚类
多目标最优化
多数据库挖掘
层次聚类
database clustering
multicriteria optimization
multi-database mining
hierarchical clustering