To improve the performance of the multiple classifier system, a new method of feature-decision level fusion is proposed based on knowledge discovery. In the new method, the base classifiers operate on different featur...To improve the performance of the multiple classifier system, a new method of feature-decision level fusion is proposed based on knowledge discovery. In the new method, the base classifiers operate on different feature spaces and their types depend on different measures of between-class separability. The uncertainty measures corresponding to each output of each base classifier are induced from the established decision tables (DTs) in the form of mass function in the Dempster-Shafer theory (DST). Furthermore, an effective fusion framework is built at the feature-decision level on the basis of a generalized rough set model and the DST. The experiment for the classification of hyperspectral remote sensing images shows that the performance of the classification can be improved by the proposed method compared with that of plurality voting (PV).展开更多
To discover the knowledge of fault diagnosis in maintenance record of flexible manufacture system(FMS) equipment. An algorithm (process) was presented, which consists of ① preparatory phase in which some items in mai...To discover the knowledge of fault diagnosis in maintenance record of flexible manufacture system(FMS) equipment. An algorithm (process) was presented, which consists of ① preparatory phase in which some items in maintenance record are selected and decomposed into associated concepts and attributes, and ② discovering and establishing process, in which some possible relationships between the concepts and attributes can be established and knowledge is formulated. The rich diagnosis knowledge in maintenance record was captured through applying the method. An application of the method to the diagnosis system for FMS equipment showed that the approach is correct and effective.展开更多
This paper proposes the principle of comprehensive knowledge discovery. Unlike most of the current knowledge discovery methods, the comprehensive knowledge discovery considers both the spatial relations and attributes...This paper proposes the principle of comprehensive knowledge discovery. Unlike most of the current knowledge discovery methods, the comprehensive knowledge discovery considers both the spatial relations and attributes of spatial entities or objects. We introduce the theory of spatial knowledge expression system and some concepts including comprehensive knowledge discovery and spatial union information table (SUIT). In theory, SUIT records all information contained in the studied objects, but in reality, because of the complexity and varieties of spatial relations, only those factors of interest to us are selected. In order to find out the comprehensive knowledge from spatial databases, an efficient comprehensive knowledge discovery algorithm called recycled algorithm (RAR) is suggested.展开更多
A new algorithm for the knowledge discovery based on statistic inductionlogic is proposed, and the validity of the methods is verified by examples. The method is suitablefor a large range of knowledge discovery applic...A new algorithm for the knowledge discovery based on statistic inductionlogic is proposed, and the validity of the methods is verified by examples. The method is suitablefor a large range of knowledge discovery applications in the studying of causal relation,uncertainty knowledge acquisition and principal factors analyzing. The language filed description ofthe state space makes the algorithm robust in the adaptation with easier understandable results,which are isomotopy with natural language in the topologic space.展开更多
The 1st International Conference on Data-driven Knowledge Discovery: When Data Science Meets Information Science took place at the National Science Library (NSL), Chinese Academy of Sciences (CAS) in Beijing from...The 1st International Conference on Data-driven Knowledge Discovery: When Data Science Meets Information Science took place at the National Science Library (NSL), Chinese Academy of Sciences (CAS) in Beijing from June 19 till June 22, 2016. The Conference was opened by NSL Director Xiangyang Huang, who placed the event within the goals of the Library, and lauded the spirit of intemational collaboration in the area of data science and knowledge discovery. The whole event was an encouraging success with over 370 registered participants and highly enlightening presentations. The Conference was organized by the Journal of Data andlnformation Science (JDIS) to bring the Joumal to the attention of an international and local audience.展开更多
In the current biomedical data movement, numerous efforts have been made to convert and normalize a large number of traditional structured and unstructured data (e.g., EHRs, reports) to semi-structured data (e.g., RDF...In the current biomedical data movement, numerous efforts have been made to convert and normalize a large number of traditional structured and unstructured data (e.g., EHRs, reports) to semi-structured data (e.g., RDF, OWL). With the increasing number of semi-structured data coming into the biomedical community, data integration and knowledge discovery from heterogeneous domains become important research problem. In the application level, detection of related concepts among medical ontologies is an important goal of life science research. It is more crucial to figure out how different concepts are related within a single ontology or across multiple ontologies by analysing predicates in different knowledge bases. However, the world today is one of information explosion, and it is extremely difficult for biomedical researchers to find existing or potential predicates to perform linking among cross domain concepts without any support from schema pattern analysis. Therefore, there is a need for a mechanism to do predicate oriented pattern analysis to partition heterogeneous ontologies into closer small topics and do query generation to discover cross domain knowledge from each topic. In this paper, we present such a model that predicates oriented pattern analysis based on their close relationship and generates a similarity matrix. Based on this similarity matrix, we apply an innovated unsupervised learning algorithm to partition large data sets into smaller and closer topics and generate meaningful queries to fully discover knowledge over a set of interlinked data sources. We have implemented a prototype system named BmQGen and evaluate the proposed model with colorectal surgical cohort from the Mayo Clinic.展开更多
Purpose: This paper explores a method of knowledge discovery by visualizing and analyzing co-occurrence relations among three or more entities in collections of journal articles.Design/methodology/approach: A variety ...Purpose: This paper explores a method of knowledge discovery by visualizing and analyzing co-occurrence relations among three or more entities in collections of journal articles.Design/methodology/approach: A variety of methods such as the model construction,system analysis and experiments are used. The author has improved Morris' crossmapping technique and developed a technique for directly describing,visualizing and analyzing co-occurrence relations among three or more entities in collections of journal articles.Findings: The visualization tools and the knowledge discovery method can efficiently reveal the multiple co-occurrence relations among three entities in collections of journal papers. It can reveal more and in-depth information than analyzing co-occurrence relations between two entities. Therefore,this method can be used for mapping knowledge domain that is manifested in association with the entities from multi-dimensional perspectives and in an all-round way.Research limitations: The technique could only be used to analyze co-occurrence relations of less than three entities at present.Practical implications: This research has expanded the study scope of co-occurrence analysis.The research result has provided a theoretical support for co-occurrence analysis.Originality/value: There has not been a systematic study on co-occurrence relations among multiple entities in collections of journal articles. This research defines multiple co-occurrence and the research scope,develops the visualization analysis tool and designs the analysis model of the knowledge discovery method.展开更多
From the ecological viewpoint this paper discusses the urban spatial-temporal relationship. We take regional towns and cities as a complex man-land system of urban eco-community. This complex man-land system comprises...From the ecological viewpoint this paper discusses the urban spatial-temporal relationship. We take regional towns and cities as a complex man-land system of urban eco-community. This complex man-land system comprises two elements of ' man' and ' land' . Here, ' man' means organization with self-determined consciousness, and ' land' means the physical environment (niche) that ' man' depends on. The complex man-land system has three basic components. They are individual, population and community. Therefore there are six types of spatial relationship for the complex man-land system. They are individual, population,community,man-man, land-land and man-land spatial relationships. Taking the Pearl(Zhujiang) River Delta as a case study, the authors found some evidence of the urban spatial relationship from the remote sensing data. Firstly, the concentration and diffusion of the cities spatial relationship was found in the remote sensing imagery. Most of the cities concentrate in the core area of the Pearl River Delta, but the diffusion situation is also significant. Secondly, the growth behavior and succession behavior of the urban spatial relationship was found in the remote sensing images comparison with different temporal data. Thirdly, the inheritance, break, or meeting emergency behavior was observed from the remote sensing data. Fourthly, the authors found many cases of symbiosis and competition in the remote sensing data of the Pearl River Delta. Fifthly, the autoeciousness, stranglehold and invasion behavior of the urban spatial relationship was discovered from the remote sensing data.展开更多
There are both associations and differences between structured and unstructured data mining. How to unite them together to be a united theoretical framework and to guide the research of knowledge discovery and data mi...There are both associations and differences between structured and unstructured data mining. How to unite them together to be a united theoretical framework and to guide the research of knowledge discovery and data mining has become an urgent problem to be solved. On the base of analysis and study of existing research results, the united model of knowledge discovery state space (UMKDSS) is presented, and the structured data mining and the complex type data mining are associated together. UMKDSS can provide theoretical guidance for complex type data mining. An application example of UMKDSS is given at last.展开更多
Important Dates Submission due November 15, 2005 Notification of acceptance December 30, 2005 Camera-ready copy due January 10, 2006 Workshop Scope Intelligence and Security Informatics (ISI) can be broadly defined as...Important Dates Submission due November 15, 2005 Notification of acceptance December 30, 2005 Camera-ready copy due January 10, 2006 Workshop Scope Intelligence and Security Informatics (ISI) can be broadly defined as the study of the development and use of advanced information technologies and systems for national and international security-related applications. The First and Second Symposiums on ISI were held in Tucson,Arizona,in 2003 and 2004,respectively. In 2005,the IEEE International Conference on ISI was held in Atlanta,Georgia. These ISI conferences have brought together academic researchers,law enforcement and intelligence experts,information technology consultant and practitioners to discuss their research and practice related to various ISI topics including ISI data management,data and text mining for ISI applications,terrorism informatics,deception detection,terrorist and criminal social network analysis,crime analysis,monitoring and surveillance,policy studies and evaluation,information assurance,among others. We continue this stream of ISI conferences by organizing the Workshop on Intelligence and Security Informatics (WISI’06) in conjunction with the Pacific Asia Conference on Knowledge Discovery and Data Mining (PAKDD’06). WISI’06 will provide a stimulating forum for ISI researchers in Pacific Asia and other regions of the world to exchange ideas and report research progress. The workshop also welcomes contributions dealing with ISI challenges specific to the Pacific Asian region.展开更多
A new structure of ESKD (expert system based on knowledge discovery system KD (D&K)) is first presented on the basis of KD (D&K)-a synthesized knowledge discovery system based on double-base (database and know...A new structure of ESKD (expert system based on knowledge discovery system KD (D&K)) is first presented on the basis of KD (D&K)-a synthesized knowledge discovery system based on double-base (database and knowledge base) cooperating mechanism. With all new features, ESKD may form a new research direction and provide a great probability for solving the wealth of knowledge in the knowledge base. The general structural frame of ESKD and some sub-systems among ESKD have been described, and the dynamic knowledge base based on double-base cooperating mechanism has been emphased on. According to the result of demonstrative experi- ment, the structure of ESKD is effective and feasible.展开更多
Since the early 1990, significant progress in database technology has provided new platform for emerging new dimensions of data engineering. New models were introduced to utilize the data sets stored in the new genera...Since the early 1990, significant progress in database technology has provided new platform for emerging new dimensions of data engineering. New models were introduced to utilize the data sets stored in the new generations of databases. These models have a deep impact on evolving decision-support systems. But they suffer a variety of practical problems while accessing real-world data sources. Specifically a type of data storage model based on data distribution theory has been increasingly used in recent years by large-scale enterprises, while it is not compatible with existing decision-support models. This data storage model stores the data in different geographical sites where they are more regularly accessed. This leads to considerably less inter-site data transfer that can reduce data security issues in some circumstances and also significantly improve data manipulation transactions speed. The aim of this paper is to propose a new approach for supporting proactive decision-making that utilizes a workable data source management methodology. The new model can effectively organize and use complex data sources, even when they are distributed in different sites in a fragmented form. At the same time, the new model provides a very high level of intellectual management decision-support by intelligent use of the data collections through utilizing new smart methods in synthesizing useful knowledge. The results of an empirical study to evaluate the model are provided.展开更多
A novel DNA coding based knowledge discovery algorithm was proposed, an example which verified its validity was given. It is proved that this algorithm can discover new simplified rules from the original rule set effi...A novel DNA coding based knowledge discovery algorithm was proposed, an example which verified its validity was given. It is proved that this algorithm can discover new simplified rules from the original rule set efficiently.展开更多
Recent developments in database technology have seen a wide variety of data being stored in huge collections. The wide variety makes the analysis tasks of a generic database a strenuous task in knowledge discovery. On...Recent developments in database technology have seen a wide variety of data being stored in huge collections. The wide variety makes the analysis tasks of a generic database a strenuous task in knowledge discovery. One approach is to summarize large datasets in such a way that the resulting summary dataset is of manageable size. Histogram has received significant attention as summarization/representative object for large database. But, it suffers from computational and space complexity. In this paper, we propose an idea to transform the histogram object into a Piecewise Linear Regression (PLR) line object and suggest that PLR objects can be less computational and storage intensive while compared to those of histograms. On the other hand to carry out a cluster analysis, we propose a distance measure for computing the distance between the PLR lines. Case study is presented based on the real data of online education system LMS. This demonstrates that PLR is a powerful knowledge representative for very large database.展开更多
Based on the analysis of the existing ranking terminology or subject relevancy of documents methods through an intermediary collection as a catalyst(designated as Group B collection) for the purpose of of non-interact...Based on the analysis of the existing ranking terminology or subject relevancy of documents methods through an intermediary collection as a catalyst(designated as Group B collection) for the purpose of of non-interactive literature-based discovery, this article proposes a bi-directional document occurrence frequency based ranking method according to the 'concurrence theory' and the degree and extent of the subject relevancy. This method explores and further refines the ranking method that is based on the occurrence frequency of the usage of certain terminologies and documents and injects a new insightful perspective of the concurrence of appropriate terminologies/documents in the 'low occurrence frequency component' of three non-interactive document collections. A preliminary experiment was conducted to analyze and to test the significance and viability of our newly designed operational method.展开更多
With massive amounts of data stored in databases, mining information and knowledge in databases has become an important issue in recent research. Researchers in many different fields have shown great interest in data ...With massive amounts of data stored in databases, mining information and knowledge in databases has become an important issue in recent research. Researchers in many different fields have shown great interest in data mining and knowledge discovery in databases. Several emerging applications in information providing services, such as data warehousing and on-line services over the Internet, also call for various data mining and knowledge discovery techniques to understand user behavior better, to improve the service provided, and to increase the business opportunities. In response to such a demand, this article is to provide a comprehensive survey on the data mining and knowledge discovery techniques developed recently, and introduce some real application systems as well. In conclusion, this article also lists some problems and challenges for further research.展开更多
The symbolic representation of time series has attracted much research interest recently. The high dimensionality typical of the data is challenging, especially as the time series becomes longer. The wide distribution...The symbolic representation of time series has attracted much research interest recently. The high dimensionality typical of the data is challenging, especially as the time series becomes longer. The wide distribution of sensors collecting more and more data exacerbates the problem. Representing a time series effectively is an essential task for decision-making activities such as classification, prediction, and knowledge discovery. In this paper, we propose a new symbolic representation method for long time series based on trend features, called trend feature symbolic approximation (TFSA). The method uses a two-step mechanism to segment long time series rapidly. Unlike some previous symbolic methods, it focuses on retaining most of the trend features and patterns of the original series. A time series is represented by trend symbols, which are also suitable for use in knowledge discovery, such as association rules mining. TFSA provides the lower bounding guarantee. Experimental results show that, compared with some previous methods, it not only has better segmentation efficiency and classification accuracy, but also is applicable for use in knowledge discovery from time series.展开更多
Knowledge acquisition Is the bottleneck of expert system. To solve this problem, KD (D&K), which is a comprehensive knowledge discovery process model coopersting both database and knowledge base, and related techno...Knowledge acquisition Is the bottleneck of expert system. To solve this problem, KD (D&K), which is a comprehensive knowledge discovery process model coopersting both database and knowledge base, and related technology are proposed. Then based on KD (D&K) and related technology, the new construction of Expert System based on Knowledge Discovery (ESKD) Is proposed. As the key knowledge acqulsltlon component of ESKD, KD (D&K) Is composed of KDD* and KDK*. KDD*- the new process model based on double bases cooperating mechanism; KDK*- the new process model based on double-basis fusion mechanism are Introduced, respectively. The overall framework of ESKD Is proposed. Some sub-systems and dynamic knowledge base system are discussed. Flnelly, the effectiveness and advantages of ESKD are tested In a real-world agriculture database. We hope that ESKD may be useful for the new generation of expert systems.展开更多
It is important for telecom companies to make sense of the large number of data they have accumulated over the years. This paper reviews the concepts and the techniques of knowledge discovery in databases (KDD), and s...It is important for telecom companies to make sense of the large number of data they have accumulated over the years. This paper reviews the concepts and the techniques of knowledge discovery in databases (KDD), and surveys applications of this technology in the telecommunications sector all over the world. It also discusses some possible applications of this technology in China, and reports a preliminary result of the first attempt to apply KDD technique in telephone traffic volume prediction. It concludes that KDD is a promising technology that can help to enhance-the competitiveness of China's telecom companies in the face of looming competition in a liberated market.展开更多
LP (Logic Programming) has been successfully applied to knowledge discovery in many fields. The execution of the LP is based on the evaluation of the first order predicate. Usually the information involved in the pred...LP (Logic Programming) has been successfully applied to knowledge discovery in many fields. The execution of the LP is based on the evaluation of the first order predicate. Usually the information involved in the predicates are local and homogenous, thus the evaluation process is relatively simple. However, the evaluation process become much more complicated when applied to KDD on the Internet where the information involved in the predicates maybe heterogeneous and distributed over many different sits. Therefor, we try to attack the problem in a multi agent system's framework so that the logic program can be written in a site independent style and deal easily with heterogeneous represented information.展开更多
文摘To improve the performance of the multiple classifier system, a new method of feature-decision level fusion is proposed based on knowledge discovery. In the new method, the base classifiers operate on different feature spaces and their types depend on different measures of between-class separability. The uncertainty measures corresponding to each output of each base classifier are induced from the established decision tables (DTs) in the form of mass function in the Dempster-Shafer theory (DST). Furthermore, an effective fusion framework is built at the feature-decision level on the basis of a generalized rough set model and the DST. The experiment for the classification of hyperspectral remote sensing images shows that the performance of the classification can be improved by the proposed method compared with that of plurality voting (PV).
文摘To discover the knowledge of fault diagnosis in maintenance record of flexible manufacture system(FMS) equipment. An algorithm (process) was presented, which consists of ① preparatory phase in which some items in maintenance record are selected and decomposed into associated concepts and attributes, and ② discovering and establishing process, in which some possible relationships between the concepts and attributes can be established and knowledge is formulated. The rich diagnosis knowledge in maintenance record was captured through applying the method. An application of the method to the diagnosis system for FMS equipment showed that the approach is correct and effective.
基金theChina’sNationalSurveyingTechnicalFund (No .2 0 0 0 7)
文摘This paper proposes the principle of comprehensive knowledge discovery. Unlike most of the current knowledge discovery methods, the comprehensive knowledge discovery considers both the spatial relations and attributes of spatial entities or objects. We introduce the theory of spatial knowledge expression system and some concepts including comprehensive knowledge discovery and spatial union information table (SUIT). In theory, SUIT records all information contained in the studied objects, but in reality, because of the complexity and varieties of spatial relations, only those factors of interest to us are selected. In order to find out the comprehensive knowledge from spatial databases, an efficient comprehensive knowledge discovery algorithm called recycled algorithm (RAR) is suggested.
基金[This work was financially supported by the National Natural Science Foundation of China (No. 69835001).]
文摘A new algorithm for the knowledge discovery based on statistic inductionlogic is proposed, and the validity of the methods is verified by examples. The method is suitablefor a large range of knowledge discovery applications in the studying of causal relation,uncertainty knowledge acquisition and principal factors analyzing. The language filed description ofthe state space makes the algorithm robust in the adaptation with easier understandable results,which are isomotopy with natural language in the topologic space.
文摘The 1st International Conference on Data-driven Knowledge Discovery: When Data Science Meets Information Science took place at the National Science Library (NSL), Chinese Academy of Sciences (CAS) in Beijing from June 19 till June 22, 2016. The Conference was opened by NSL Director Xiangyang Huang, who placed the event within the goals of the Library, and lauded the spirit of intemational collaboration in the area of data science and knowledge discovery. The whole event was an encouraging success with over 370 registered participants and highly enlightening presentations. The Conference was organized by the Journal of Data andlnformation Science (JDIS) to bring the Joumal to the attention of an international and local audience.
文摘In the current biomedical data movement, numerous efforts have been made to convert and normalize a large number of traditional structured and unstructured data (e.g., EHRs, reports) to semi-structured data (e.g., RDF, OWL). With the increasing number of semi-structured data coming into the biomedical community, data integration and knowledge discovery from heterogeneous domains become important research problem. In the application level, detection of related concepts among medical ontologies is an important goal of life science research. It is more crucial to figure out how different concepts are related within a single ontology or across multiple ontologies by analysing predicates in different knowledge bases. However, the world today is one of information explosion, and it is extremely difficult for biomedical researchers to find existing or potential predicates to perform linking among cross domain concepts without any support from schema pattern analysis. Therefore, there is a need for a mechanism to do predicate oriented pattern analysis to partition heterogeneous ontologies into closer small topics and do query generation to discover cross domain knowledge from each topic. In this paper, we present such a model that predicates oriented pattern analysis based on their close relationship and generates a similarity matrix. Based on this similarity matrix, we apply an innovated unsupervised learning algorithm to partition large data sets into smaller and closer topics and generate meaningful queries to fully discover knowledge over a set of interlinked data sources. We have implemented a prototype system named BmQGen and evaluate the proposed model with colorectal surgical cohort from the Mayo Clinic.
文摘Purpose: This paper explores a method of knowledge discovery by visualizing and analyzing co-occurrence relations among three or more entities in collections of journal articles.Design/methodology/approach: A variety of methods such as the model construction,system analysis and experiments are used. The author has improved Morris' crossmapping technique and developed a technique for directly describing,visualizing and analyzing co-occurrence relations among three or more entities in collections of journal articles.Findings: The visualization tools and the knowledge discovery method can efficiently reveal the multiple co-occurrence relations among three entities in collections of journal papers. It can reveal more and in-depth information than analyzing co-occurrence relations between two entities. Therefore,this method can be used for mapping knowledge domain that is manifested in association with the entities from multi-dimensional perspectives and in an all-round way.Research limitations: The technique could only be used to analyze co-occurrence relations of less than three entities at present.Practical implications: This research has expanded the study scope of co-occurrence analysis.The research result has provided a theoretical support for co-occurrence analysis.Originality/value: There has not been a systematic study on co-occurrence relations among multiple entities in collections of journal articles. This research defines multiple co-occurrence and the research scope,develops the visualization analysis tool and designs the analysis model of the knowledge discovery method.
基金Under the auspices of the National Natural Science Foundation of China(No.69896250-4).
文摘From the ecological viewpoint this paper discusses the urban spatial-temporal relationship. We take regional towns and cities as a complex man-land system of urban eco-community. This complex man-land system comprises two elements of ' man' and ' land' . Here, ' man' means organization with self-determined consciousness, and ' land' means the physical environment (niche) that ' man' depends on. The complex man-land system has three basic components. They are individual, population and community. Therefore there are six types of spatial relationship for the complex man-land system. They are individual, population,community,man-man, land-land and man-land spatial relationships. Taking the Pearl(Zhujiang) River Delta as a case study, the authors found some evidence of the urban spatial relationship from the remote sensing data. Firstly, the concentration and diffusion of the cities spatial relationship was found in the remote sensing imagery. Most of the cities concentrate in the core area of the Pearl River Delta, but the diffusion situation is also significant. Secondly, the growth behavior and succession behavior of the urban spatial relationship was found in the remote sensing images comparison with different temporal data. Thirdly, the inheritance, break, or meeting emergency behavior was observed from the remote sensing data. Fourthly, the authors found many cases of symbiosis and competition in the remote sensing data of the Pearl River Delta. Fifthly, the autoeciousness, stranglehold and invasion behavior of the urban spatial relationship was discovered from the remote sensing data.
文摘There are both associations and differences between structured and unstructured data mining. How to unite them together to be a united theoretical framework and to guide the research of knowledge discovery and data mining has become an urgent problem to be solved. On the base of analysis and study of existing research results, the united model of knowledge discovery state space (UMKDSS) is presented, and the structured data mining and the complex type data mining are associated together. UMKDSS can provide theoretical guidance for complex type data mining. An application example of UMKDSS is given at last.
文摘Important Dates Submission due November 15, 2005 Notification of acceptance December 30, 2005 Camera-ready copy due January 10, 2006 Workshop Scope Intelligence and Security Informatics (ISI) can be broadly defined as the study of the development and use of advanced information technologies and systems for national and international security-related applications. The First and Second Symposiums on ISI were held in Tucson,Arizona,in 2003 and 2004,respectively. In 2005,the IEEE International Conference on ISI was held in Atlanta,Georgia. These ISI conferences have brought together academic researchers,law enforcement and intelligence experts,information technology consultant and practitioners to discuss their research and practice related to various ISI topics including ISI data management,data and text mining for ISI applications,terrorism informatics,deception detection,terrorist and criminal social network analysis,crime analysis,monitoring and surveillance,policy studies and evaluation,information assurance,among others. We continue this stream of ISI conferences by organizing the Workshop on Intelligence and Security Informatics (WISI’06) in conjunction with the Pacific Asia Conference on Knowledge Discovery and Data Mining (PAKDD’06). WISI’06 will provide a stimulating forum for ISI researchers in Pacific Asia and other regions of the world to exchange ideas and report research progress. The workshop also welcomes contributions dealing with ISI challenges specific to the Pacific Asian region.
文摘A new structure of ESKD (expert system based on knowledge discovery system KD (D&K)) is first presented on the basis of KD (D&K)-a synthesized knowledge discovery system based on double-base (database and knowledge base) cooperating mechanism. With all new features, ESKD may form a new research direction and provide a great probability for solving the wealth of knowledge in the knowledge base. The general structural frame of ESKD and some sub-systems among ESKD have been described, and the dynamic knowledge base based on double-base cooperating mechanism has been emphased on. According to the result of demonstrative experi- ment, the structure of ESKD is effective and feasible.
文摘Since the early 1990, significant progress in database technology has provided new platform for emerging new dimensions of data engineering. New models were introduced to utilize the data sets stored in the new generations of databases. These models have a deep impact on evolving decision-support systems. But they suffer a variety of practical problems while accessing real-world data sources. Specifically a type of data storage model based on data distribution theory has been increasingly used in recent years by large-scale enterprises, while it is not compatible with existing decision-support models. This data storage model stores the data in different geographical sites where they are more regularly accessed. This leads to considerably less inter-site data transfer that can reduce data security issues in some circumstances and also significantly improve data manipulation transactions speed. The aim of this paper is to propose a new approach for supporting proactive decision-making that utilizes a workable data source management methodology. The new model can effectively organize and use complex data sources, even when they are distributed in different sites in a fragmented form. At the same time, the new model provides a very high level of intellectual management decision-support by intelligent use of the data collections through utilizing new smart methods in synthesizing useful knowledge. The results of an empirical study to evaluate the model are provided.
文摘A novel DNA coding based knowledge discovery algorithm was proposed, an example which verified its validity was given. It is proved that this algorithm can discover new simplified rules from the original rule set efficiently.
文摘Recent developments in database technology have seen a wide variety of data being stored in huge collections. The wide variety makes the analysis tasks of a generic database a strenuous task in knowledge discovery. One approach is to summarize large datasets in such a way that the resulting summary dataset is of manageable size. Histogram has received significant attention as summarization/representative object for large database. But, it suffers from computational and space complexity. In this paper, we propose an idea to transform the histogram object into a Piecewise Linear Regression (PLR) line object and suggest that PLR objects can be less computational and storage intensive while compared to those of histograms. On the other hand to carry out a cluster analysis, we propose a distance measure for computing the distance between the PLR lines. Case study is presented based on the real data of online education system LMS. This demonstrates that PLR is a powerful knowledge representative for very large database.
基金supported by Humanities and Social Science Foundation of Ministry of Education of China(Grant No.07JA870005)
文摘Based on the analysis of the existing ranking terminology or subject relevancy of documents methods through an intermediary collection as a catalyst(designated as Group B collection) for the purpose of of non-interactive literature-based discovery, this article proposes a bi-directional document occurrence frequency based ranking method according to the 'concurrence theory' and the degree and extent of the subject relevancy. This method explores and further refines the ranking method that is based on the occurrence frequency of the usage of certain terminologies and documents and injects a new insightful perspective of the concurrence of appropriate terminologies/documents in the 'low occurrence frequency component' of three non-interactive document collections. A preliminary experiment was conducted to analyze and to test the significance and viability of our newly designed operational method.
文摘With massive amounts of data stored in databases, mining information and knowledge in databases has become an important issue in recent research. Researchers in many different fields have shown great interest in data mining and knowledge discovery in databases. Several emerging applications in information providing services, such as data warehousing and on-line services over the Internet, also call for various data mining and knowledge discovery techniques to understand user behavior better, to improve the service provided, and to increase the business opportunities. In response to such a demand, this article is to provide a comprehensive survey on the data mining and knowledge discovery techniques developed recently, and introduce some real application systems as well. In conclusion, this article also lists some problems and challenges for further research.
基金supported by the National High-Tech R&D Program(863)of China(Nos.2012AA012600,2011AA010702,2012AA01A401,and 2012AA01A402)the National Natural Science Foundation of China(No.60933005)the National Science and Technology of China(No.2012BAH38B04)
文摘The symbolic representation of time series has attracted much research interest recently. The high dimensionality typical of the data is challenging, especially as the time series becomes longer. The wide distribution of sensors collecting more and more data exacerbates the problem. Representing a time series effectively is an essential task for decision-making activities such as classification, prediction, and knowledge discovery. In this paper, we propose a new symbolic representation method for long time series based on trend features, called trend feature symbolic approximation (TFSA). The method uses a two-step mechanism to segment long time series rapidly. Unlike some previous symbolic methods, it focuses on retaining most of the trend features and patterns of the original series. A time series is represented by trend symbols, which are also suitable for use in knowledge discovery, such as association rules mining. TFSA provides the lower bounding guarantee. Experimental results show that, compared with some previous methods, it not only has better segmentation efficiency and classification accuracy, but also is applicable for use in knowledge discovery from time series.
基金Supported by the National Natural Science Foundation of China (Grant No. 69835001)the Ministry of Education of China (Grant No. [2000] 175),the Science Foundation of Beijing (Grant No. 4022008).
文摘Knowledge acquisition Is the bottleneck of expert system. To solve this problem, KD (D&K), which is a comprehensive knowledge discovery process model coopersting both database and knowledge base, and related technology are proposed. Then based on KD (D&K) and related technology, the new construction of Expert System based on Knowledge Discovery (ESKD) Is proposed. As the key knowledge acqulsltlon component of ESKD, KD (D&K) Is composed of KDD* and KDK*. KDD*- the new process model based on double bases cooperating mechanism; KDK*- the new process model based on double-basis fusion mechanism are Introduced, respectively. The overall framework of ESKD Is proposed. Some sub-systems and dynamic knowledge base system are discussed. Flnelly, the effectiveness and advantages of ESKD are tested In a real-world agriculture database. We hope that ESKD may be useful for the new generation of expert systems.
文摘It is important for telecom companies to make sense of the large number of data they have accumulated over the years. This paper reviews the concepts and the techniques of knowledge discovery in databases (KDD), and surveys applications of this technology in the telecommunications sector all over the world. It also discusses some possible applications of this technology in China, and reports a preliminary result of the first attempt to apply KDD technique in telephone traffic volume prediction. It concludes that KDD is a promising technology that can help to enhance-the competitiveness of China's telecom companies in the face of looming competition in a liberated market.
文摘LP (Logic Programming) has been successfully applied to knowledge discovery in many fields. The execution of the LP is based on the evaluation of the first order predicate. Usually the information involved in the predicates are local and homogenous, thus the evaluation process is relatively simple. However, the evaluation process become much more complicated when applied to KDD on the Internet where the information involved in the predicates maybe heterogeneous and distributed over many different sits. Therefor, we try to attack the problem in a multi agent system's framework so that the logic program can be written in a site independent style and deal easily with heterogeneous represented information.