摘要
现行的数据挖掘算法大多是针对单一数据源进行挖掘,多数据源挖掘是网络分布式状况下KDD所面临的新问题,是解决基于全局数据分布状态下知识发现问题的有效技术。本文提出了一种多数据源知识发现新方法,该方法通过共享从其它数据源中发现的知识模式,采用抽样检验的方法来判断知识在本地数据源的有效性,大大提高了知识发现的效率。实验结果表明了该方法的有效性,该方法可以进一步推广,作为对已知模式的高效知识发现方法,并可应用于增量式知识发现。
Nowadays, the techniques of data mining focus on single data source. Mining from multi-data sources is a new problem in Web environment and is also an efficient technique for solving knowledge discovery in distributed databases. A new method for mining multi-data sources is presented in this paper. By sharing knowledge patterns discovered in other similar data sources, hypothesis testing is employed for verifying whether the patterns are also suitable for local data source. The efficiency of KDD can be improved greatly. Finally, the effectiveness of this method is analyzed and experimental result is given. This method can be extended as an efficient data mining algorithm in case of apriori hypothesizes are provided. And it can be also used for incremental data mining.
出处
《模式识别与人工智能》
EI
CSCD
北大核心
2005年第5期564-568,共5页
Pattern Recognition and Artificial Intelligence
基金
国家863计划资助项目(No.2003AA118070)
关键词
多数据源
假设检验
知识共享
知识发现
Multi-Data Sources, Hypothesis Testing, Knowledge Sharing, Knowledge Discovery