With massive amounts of data stored in databases, mining information and knowledge in databases has become an important issue in recent research. Researchers in many different fields have shown great interest in data ...With massive amounts of data stored in databases, mining information and knowledge in databases has become an important issue in recent research. Researchers in many different fields have shown great interest in data mining and knowledge discovery in databases. Several emerging applications in information providing services, such as data warehousing and on-line services over the Internet, also call for various data mining and knowledge discovery techniques to understand user behavior better, to improve the service provided, and to increase the business opportunities. In response to such a demand, this article is to provide a comprehensive survey on the data mining and knowledge discovery techniques developed recently, and introduce some real application systems as well. In conclusion, this article also lists some problems and challenges for further research.展开更多
The fight against fraud and trafficking is a fundamental mission of customs. The conditions for carrying out this mission depend both on the evolution of economic issues and on the behaviour of the actors in charge of...The fight against fraud and trafficking is a fundamental mission of customs. The conditions for carrying out this mission depend both on the evolution of economic issues and on the behaviour of the actors in charge of its implementation. As part of the customs clearance process, customs are nowadays confronted with an increasing volume of goods in connection with the development of international trade. Automated risk management is therefore required to limit intrusive control. In this article, we propose an unsupervised classification method to extract knowledge rules from a database of customs offences in order to identify abnormal behaviour resulting from customs control. The idea is to apply the Apriori principle on the basis of frequent grounds on a database relating to customs offences in customs procedures to uncover potential rules of association between a customs operation and an offence for the purpose of extracting knowledge governing the occurrence of fraud. This mass of often heterogeneous and complex data thus generates new needs that knowledge extraction methods must be able to meet. The assessment of infringements inevitably requires a proper identification of the risks. It is an original approach based on data mining or data mining to build association rules in two steps: first, search for frequent patterns (support >= minimum support) then from the frequent patterns, produce association rules (Trust >= Minimum Trust). The simulations carried out highlighted three main association rules: forecasting rules, targeting rules and neutral rules with the introduction of a third indicator of rule relevance which is the Lift measure. Confidence in the first two rules has been set at least 50%.展开更多
This paper proposes the principle of comprehensive knowledge discovery. Unlike most of the current knowledge discovery methods, the comprehensive knowledge discovery considers both the spatial relations and attributes...This paper proposes the principle of comprehensive knowledge discovery. Unlike most of the current knowledge discovery methods, the comprehensive knowledge discovery considers both the spatial relations and attributes of spatial entities or objects. We introduce the theory of spatial knowledge expression system and some concepts including comprehensive knowledge discovery and spatial union information table (SUIT). In theory, SUIT records all information contained in the studied objects, but in reality, because of the complexity and varieties of spatial relations, only those factors of interest to us are selected. In order to find out the comprehensive knowledge from spatial databases, an efficient comprehensive knowledge discovery algorithm called recycled algorithm (RAR) is suggested.展开更多
This paper presents a new efficient algorithm for mining frequent closed itemsets. It enumerates the closed set of frequent itemsets by using a novel compound frequent itemset tree that facilitates fast growth and eff...This paper presents a new efficient algorithm for mining frequent closed itemsets. It enumerates the closed set of frequent itemsets by using a novel compound frequent itemset tree that facilitates fast growth and efficient pruning of search space. It also employs a hybrid approach that adapts search strategies, representations of projected transaction subsets, and projecting methods to the characteristics of the dataset. Efficient local pruning, global subsumption checking, and fast hashing methods are detailed in this paper. The principle that balances the overheads of search space growth and pruning is also discussed. Extensive experimental evaluations on real world and artificial datasets showed that our algorithm outperforms CHARM by a factor of five and is one to three orders of magnitude more efficient than CLOSET and MAFIA.展开更多
The amount of data for decision making has increased tremendously in the age of the digital economy. Decision makers who fail to proficiently manipulate the data produced may make incorrect decisions and therefore har...The amount of data for decision making has increased tremendously in the age of the digital economy. Decision makers who fail to proficiently manipulate the data produced may make incorrect decisions and therefore harm their business. Thus, the task of extracting and classifying the useful information efficiently and effectively from huge amounts of computational data is of special importance. In this paper, we consider that the attributes of data could be both crisp and fuzzy. By examining the suitable partial data, segments with different classes are formed, then a multithreaded computation is performed to generate crisp rules (if possible), and finally, the fuzzy partition technique is employed to deal with the fuzzy attributes for classification. The rules generated in classifying the overall data can be used to gain more knowledge from the data collected.展开更多
Mining knowledge from database has been thought as a key research issue in database system. Great mterest has been paid in data mining by researchers in different fields. In this paper,data mining techniques are intro...Mining knowledge from database has been thought as a key research issue in database system. Great mterest has been paid in data mining by researchers in different fields. In this paper,data mining techniques are introduced broadly including its definition,purpose,characteristic, principal processes and classifications. As an example,the studies on the mining association rules are illustrated. At last,some data mining prototypes are provided and several research trends on the data mining are discussed.展开更多
1.引言近年来,数据发掘(Data Mining),亦称数据库中的知识发现(Knowledge Discovery in Databases,简称KDD),受到当今国际人工智能与数据库界的广泛重视。关联规则是KDD研究中的一个重要研究课题。该问题是R.Agrawal等人提出的,目的是...1.引言近年来,数据发掘(Data Mining),亦称数据库中的知识发现(Knowledge Discovery in Databases,简称KDD),受到当今国际人工智能与数据库界的广泛重视。关联规则是KDD研究中的一个重要研究课题。该问题是R.Agrawal等人提出的,目的是要在交易数据库中发现各项目之间的关系。例如,有这样一条关联规则:黄油,牛奶面包(30%,2%)。其含义是购买了黄油和牛奶的顾客还将购买面包,30%、2%分别是该规则的信任度和支持度。展开更多
文摘With massive amounts of data stored in databases, mining information and knowledge in databases has become an important issue in recent research. Researchers in many different fields have shown great interest in data mining and knowledge discovery in databases. Several emerging applications in information providing services, such as data warehousing and on-line services over the Internet, also call for various data mining and knowledge discovery techniques to understand user behavior better, to improve the service provided, and to increase the business opportunities. In response to such a demand, this article is to provide a comprehensive survey on the data mining and knowledge discovery techniques developed recently, and introduce some real application systems as well. In conclusion, this article also lists some problems and challenges for further research.
文摘The fight against fraud and trafficking is a fundamental mission of customs. The conditions for carrying out this mission depend both on the evolution of economic issues and on the behaviour of the actors in charge of its implementation. As part of the customs clearance process, customs are nowadays confronted with an increasing volume of goods in connection with the development of international trade. Automated risk management is therefore required to limit intrusive control. In this article, we propose an unsupervised classification method to extract knowledge rules from a database of customs offences in order to identify abnormal behaviour resulting from customs control. The idea is to apply the Apriori principle on the basis of frequent grounds on a database relating to customs offences in customs procedures to uncover potential rules of association between a customs operation and an offence for the purpose of extracting knowledge governing the occurrence of fraud. This mass of often heterogeneous and complex data thus generates new needs that knowledge extraction methods must be able to meet. The assessment of infringements inevitably requires a proper identification of the risks. It is an original approach based on data mining or data mining to build association rules in two steps: first, search for frequent patterns (support >= minimum support) then from the frequent patterns, produce association rules (Trust >= Minimum Trust). The simulations carried out highlighted three main association rules: forecasting rules, targeting rules and neutral rules with the introduction of a third indicator of rule relevance which is the Lift measure. Confidence in the first two rules has been set at least 50%.
基金theChina’sNationalSurveyingTechnicalFund (No .2 0 0 0 7)
文摘This paper proposes the principle of comprehensive knowledge discovery. Unlike most of the current knowledge discovery methods, the comprehensive knowledge discovery considers both the spatial relations and attributes of spatial entities or objects. We introduce the theory of spatial knowledge expression system and some concepts including comprehensive knowledge discovery and spatial union information table (SUIT). In theory, SUIT records all information contained in the studied objects, but in reality, because of the complexity and varieties of spatial relations, only those factors of interest to us are selected. In order to find out the comprehensive knowledge from spatial databases, an efficient comprehensive knowledge discovery algorithm called recycled algorithm (RAR) is suggested.
文摘This paper presents a new efficient algorithm for mining frequent closed itemsets. It enumerates the closed set of frequent itemsets by using a novel compound frequent itemset tree that facilitates fast growth and efficient pruning of search space. It also employs a hybrid approach that adapts search strategies, representations of projected transaction subsets, and projecting methods to the characteristics of the dataset. Efficient local pruning, global subsumption checking, and fast hashing methods are detailed in this paper. The principle that balances the overheads of search space growth and pruning is also discussed. Extensive experimental evaluations on real world and artificial datasets showed that our algorithm outperforms CHARM by a factor of five and is one to three orders of magnitude more efficient than CLOSET and MAFIA.
文摘The amount of data for decision making has increased tremendously in the age of the digital economy. Decision makers who fail to proficiently manipulate the data produced may make incorrect decisions and therefore harm their business. Thus, the task of extracting and classifying the useful information efficiently and effectively from huge amounts of computational data is of special importance. In this paper, we consider that the attributes of data could be both crisp and fuzzy. By examining the suitable partial data, segments with different classes are formed, then a multithreaded computation is performed to generate crisp rules (if possible), and finally, the fuzzy partition technique is employed to deal with the fuzzy attributes for classification. The rules generated in classifying the overall data can be used to gain more knowledge from the data collected.
文摘Mining knowledge from database has been thought as a key research issue in database system. Great mterest has been paid in data mining by researchers in different fields. In this paper,data mining techniques are introduced broadly including its definition,purpose,characteristic, principal processes and classifications. As an example,the studies on the mining association rules are illustrated. At last,some data mining prototypes are provided and several research trends on the data mining are discussed.
文摘1.引言近年来,数据发掘(Data Mining),亦称数据库中的知识发现(Knowledge Discovery in Databases,简称KDD),受到当今国际人工智能与数据库界的广泛重视。关联规则是KDD研究中的一个重要研究课题。该问题是R.Agrawal等人提出的,目的是要在交易数据库中发现各项目之间的关系。例如,有这样一条关联规则:黄油,牛奶面包(30%,2%)。其含义是购买了黄油和牛奶的顾客还将购买面包,30%、2%分别是该规则的信任度和支持度。