Identification of underlying partial differential equations(PDEs)for complex systems remains a formidable challenge.In the present study,a robust PDE identification method is proposed,demonstrating the ability to extr...Identification of underlying partial differential equations(PDEs)for complex systems remains a formidable challenge.In the present study,a robust PDE identification method is proposed,demonstrating the ability to extract accurate governing equations under noisy conditions without prior knowledge.Specifically,the proposed method combines gene expression programming,one type of evolutionary algorithm capable of generating unseen terms based solely on basic operators and functional terms,with symbolic regression neural networks.These networks are designed to represent explicit functional expressions and optimize them with data gradients.In particular,the specifically designed neural networks can be easily transformed to physical constraints for the training data,embedding the discovered PDEs to further optimize the metadata used for iterative PDE identification.The proposed method has been tested in four canonical PDE cases,validating its effectiveness without preliminary information and confirming its suitability for practical applications across various noise levels.展开更多
To improve the performance of the multiple classifier system, a new method of feature-decision level fusion is proposed based on knowledge discovery. In the new method, the base classifiers operate on different featur...To improve the performance of the multiple classifier system, a new method of feature-decision level fusion is proposed based on knowledge discovery. In the new method, the base classifiers operate on different feature spaces and their types depend on different measures of between-class separability. The uncertainty measures corresponding to each output of each base classifier are induced from the established decision tables (DTs) in the form of mass function in the Dempster-Shafer theory (DST). Furthermore, an effective fusion framework is built at the feature-decision level on the basis of a generalized rough set model and the DST. The experiment for the classification of hyperspectral remote sensing images shows that the performance of the classification can be improved by the proposed method compared with that of plurality voting (PV).展开更多
To discover the knowledge of fault diagnosis in maintenance record of flexible manufacture system(FMS) equipment. An algorithm (process) was presented, which consists of ① preparatory phase in which some items in mai...To discover the knowledge of fault diagnosis in maintenance record of flexible manufacture system(FMS) equipment. An algorithm (process) was presented, which consists of ① preparatory phase in which some items in maintenance record are selected and decomposed into associated concepts and attributes, and ② discovering and establishing process, in which some possible relationships between the concepts and attributes can be established and knowledge is formulated. The rich diagnosis knowledge in maintenance record was captured through applying the method. An application of the method to the diagnosis system for FMS equipment showed that the approach is correct and effective.展开更多
A new algorithm for the knowledge discovery based on statistic inductionlogic is proposed, and the validity of the methods is verified by examples. The method is suitablefor a large range of knowledge discovery applic...A new algorithm for the knowledge discovery based on statistic inductionlogic is proposed, and the validity of the methods is verified by examples. The method is suitablefor a large range of knowledge discovery applications in the studying of causal relation,uncertainty knowledge acquisition and principal factors analyzing. The language filed description ofthe state space makes the algorithm robust in the adaptation with easier understandable results,which are isomotopy with natural language in the topologic space.展开更多
In the current biomedical data movement, numerous efforts have been made to convert and normalize a large number of traditional structured and unstructured data (e.g., EHRs, reports) to semi-structured data (e.g., RDF...In the current biomedical data movement, numerous efforts have been made to convert and normalize a large number of traditional structured and unstructured data (e.g., EHRs, reports) to semi-structured data (e.g., RDF, OWL). With the increasing number of semi-structured data coming into the biomedical community, data integration and knowledge discovery from heterogeneous domains become important research problem. In the application level, detection of related concepts among medical ontologies is an important goal of life science research. It is more crucial to figure out how different concepts are related within a single ontology or across multiple ontologies by analysing predicates in different knowledge bases. However, the world today is one of information explosion, and it is extremely difficult for biomedical researchers to find existing or potential predicates to perform linking among cross domain concepts without any support from schema pattern analysis. Therefore, there is a need for a mechanism to do predicate oriented pattern analysis to partition heterogeneous ontologies into closer small topics and do query generation to discover cross domain knowledge from each topic. In this paper, we present such a model that predicates oriented pattern analysis based on their close relationship and generates a similarity matrix. Based on this similarity matrix, we apply an innovated unsupervised learning algorithm to partition large data sets into smaller and closer topics and generate meaningful queries to fully discover knowledge over a set of interlinked data sources. We have implemented a prototype system named BmQGen and evaluate the proposed model with colorectal surgical cohort from the Mayo Clinic.展开更多
The 1st International Conference on Data-driven Knowledge Discovery: When Data Science Meets Information Science took place at the National Science Library (NSL), Chinese Academy of Sciences (CAS) in Beijing from...The 1st International Conference on Data-driven Knowledge Discovery: When Data Science Meets Information Science took place at the National Science Library (NSL), Chinese Academy of Sciences (CAS) in Beijing from June 19 till June 22, 2016. The Conference was opened by NSL Director Xiangyang Huang, who placed the event within the goals of the Library, and lauded the spirit of intemational collaboration in the area of data science and knowledge discovery. The whole event was an encouraging success with over 370 registered participants and highly enlightening presentations. The Conference was organized by the Journal of Data andlnformation Science (JDIS) to bring the Joumal to the attention of an international and local audience.展开更多
Knowledge discovery, as an increasingly adopted information technology in biomedical science, has shown great promise in the field of Traditional Chinese Medicine (TCM). In this paper, we provided a kind of multidimen...Knowledge discovery, as an increasingly adopted information technology in biomedical science, has shown great promise in the field of Traditional Chinese Medicine (TCM). In this paper, we provided a kind of multidimensional table which was well suited for organizing and analyzing the data in ancient Chinese books on Materia Medica. Moreover, we demonstrated its capability of facilitating further mining works in TCM through two illustrative studies of discovering meaningful patterns in the three-dimensional table of Shennong’s Classic of Materia Medica. This work might provide an appropriate data model for the development of knowledge discovery in TCM.展开更多
Purpose: This paper explores a method of knowledge discovery by visualizing and analyzing co-occurrence relations among three or more entities in collections of journal articles.Design/methodology/approach: A variety ...Purpose: This paper explores a method of knowledge discovery by visualizing and analyzing co-occurrence relations among three or more entities in collections of journal articles.Design/methodology/approach: A variety of methods such as the model construction,system analysis and experiments are used. The author has improved Morris' crossmapping technique and developed a technique for directly describing,visualizing and analyzing co-occurrence relations among three or more entities in collections of journal articles.Findings: The visualization tools and the knowledge discovery method can efficiently reveal the multiple co-occurrence relations among three entities in collections of journal papers. It can reveal more and in-depth information than analyzing co-occurrence relations between two entities. Therefore,this method can be used for mapping knowledge domain that is manifested in association with the entities from multi-dimensional perspectives and in an all-round way.Research limitations: The technique could only be used to analyze co-occurrence relations of less than three entities at present.Practical implications: This research has expanded the study scope of co-occurrence analysis.The research result has provided a theoretical support for co-occurrence analysis.Originality/value: There has not been a systematic study on co-occurrence relations among multiple entities in collections of journal articles. This research defines multiple co-occurrence and the research scope,develops the visualization analysis tool and designs the analysis model of the knowledge discovery method.展开更多
An integrated solution for discovery of literature information knowledge is proposed. The analytic model of literature Information model and discovery of literature information knowledge are illustrated. Practical ill...An integrated solution for discovery of literature information knowledge is proposed. The analytic model of literature Information model and discovery of literature information knowledge are illustrated. Practical illustrative example for discovery of literature information knowledge is given.展开更多
Structural choice is a significant decision having an important influence on structural function, social economics, structural reliability and construction cost. A Case Based Reasoning system with its retrieval part c...Structural choice is a significant decision having an important influence on structural function, social economics, structural reliability and construction cost. A Case Based Reasoning system with its retrieval part constructed with a KDD subsystem, is put forward to make a decision for a large scale engineering project. A typical CBR system consists of four parts: case representation, case retriever, evaluation, and adaptation. A case library is a set of parameterized excellent and successful structures. For a structural choice, the key point is that the system must be able to detect the pattern classes hidden in the case library and classify the input parameters into classes properly. That is done by using the KDD Data Mining algorithm based on Self Organizing Feature Maps (SOFM), which makes the whole system more adaptive, self organizing, self learning and open.展开更多
There are both associations and differences between structured and unstructured data mining. How to unite them together to be a united theoretical framework and to guide the research of knowledge discovery and data mi...There are both associations and differences between structured and unstructured data mining. How to unite them together to be a united theoretical framework and to guide the research of knowledge discovery and data mining has become an urgent problem to be solved. On the base of analysis and study of existing research results, the united model of knowledge discovery state space (UMKDSS) is presented, and the structured data mining and the complex type data mining are associated together. UMKDSS can provide theoretical guidance for complex type data mining. An application example of UMKDSS is given at last.展开更多
LP (Logic Programming) has been successfully applied to knowledge discovery in many fields. The execution of the LP is based on the evaluation of the first order predicate. Usually the information involved in the pred...LP (Logic Programming) has been successfully applied to knowledge discovery in many fields. The execution of the LP is based on the evaluation of the first order predicate. Usually the information involved in the predicates are local and homogenous, thus the evaluation process is relatively simple. However, the evaluation process become much more complicated when applied to KDD on the Internet where the information involved in the predicates maybe heterogeneous and distributed over many different sits. Therefor, we try to attack the problem in a multi agent system's framework so that the logic program can be written in a site independent style and deal easily with heterogeneous represented information.展开更多
AIM To develop a framework to incorporate background domain knowledge into classification rule learning for knowledge discovery in biomedicine.METHODS Bayesian rule learning(BRL) is a rule-based classifier that uses a...AIM To develop a framework to incorporate background domain knowledge into classification rule learning for knowledge discovery in biomedicine.METHODS Bayesian rule learning(BRL) is a rule-based classifier that uses a greedy best-first search over a space of Bayesian belief-networks(BN) to find the optimal BN to explain the input dataset, and then infers classification rules from this BN. BRL uses a Bayesian score to evaluate the quality of BNs. In this paper, we extended the Bayesian score to include informative structure priors, which encodes our prior domain knowledge about the dataset. We call this extension of BRL as BRL_p. The structure prior has a λ hyperparameter that allows the user to tune the degree of incorporation of the prior knowledge in the model learning process. We studied the effect of λ on model learning using a simulated dataset and a real-world lung cancer prognostic biomarker dataset, by measuring the degree of incorporation of our specified prior knowledge. We also monitored its effect on the model predictive performance. Finally, we compared BRL_p to other stateof-the-art classifiers commonly used in biomedicine.RESULTS We evaluated the degree of incorporation of prior knowledge into BRL_p, with simulated data by measuring the Graph Edit Distance between the true datagenerating model and the model learned by BRL_p. We specified the true model using informative structurepriors. We observed that by increasing the value of λ we were able to increase the influence of the specified structure priors on model learning. A large value of λ of BRL_p caused it to return the true model. This also led to a gain in predictive performance measured by area under the receiver operator characteristic curve(AUC). We then obtained a publicly available real-world lung cancer prognostic biomarker dataset and specified a known biomarker from literature [the epidermal growth factor receptor(EGFR) gene]. We again observed that larger values of λ led to an increased incorporation of EGFR into the final BRL_p model. This relevant background knowledge also led to a gain in AUC.CONCLUSION BRL_p enables tunable structure priors to be incorporated during Bayesian classification rule learning that integrates data and knowledge as demonstrated using lung cancer biomarker data.展开更多
This paper proposes the principle of comprehensive knowledge discovery. Unlike most of the current knowledge discovery methods, the comprehensive knowledge discovery considers both the spatial relations and attributes...This paper proposes the principle of comprehensive knowledge discovery. Unlike most of the current knowledge discovery methods, the comprehensive knowledge discovery considers both the spatial relations and attributes of spatial entities or objects. We introduce the theory of spatial knowledge expression system and some concepts including comprehensive knowledge discovery and spatial union information table (SUIT). In theory, SUIT records all information contained in the studied objects, but in reality, because of the complexity and varieties of spatial relations, only those factors of interest to us are selected. In order to find out the comprehensive knowledge from spatial databases, an efficient comprehensive knowledge discovery algorithm called recycled algorithm (RAR) is suggested.展开更多
A new structure of ESKD (expert system based on knowledge discovery system KD (D&K)) is first presented on the basis of KD (D&K)-a synthesized knowledge discovery system based on double-base (database and know...A new structure of ESKD (expert system based on knowledge discovery system KD (D&K)) is first presented on the basis of KD (D&K)-a synthesized knowledge discovery system based on double-base (database and knowledge base) cooperating mechanism. With all new features, ESKD may form a new research direction and provide a great probability for solving the wealth of knowledge in the knowledge base. The general structural frame of ESKD and some sub-systems among ESKD have been described, and the dynamic knowledge base based on double-base cooperating mechanism has been emphased on. According to the result of demonstrative experi- ment, the structure of ESKD is effective and feasible.展开更多
The present article outlines progress made in designing an intelligent information system for automatic management and knowledge discovery in large numeric and scientific databases, with a validating application to th...The present article outlines progress made in designing an intelligent information system for automatic management and knowledge discovery in large numeric and scientific databases, with a validating application to the CAST-NEONS environmental databases used for ocean modeling and prediction. We describe a discovery-learning process (Automatic Data Analysis System) which combines the features of two machine learning techniques to generate sets of production rules that efficiently describe the observational raw data contained in the database. Data clustering allows the system to classify the raw data into meaningful conceptual clusters, which the system learns by induction to build decision trees, from which are automatically deduced the production rules.展开更多
A novel DNA coding based knowledge discovery algorithm was proposed, an example which verified its validity was given. It is proved that this algorithm can discover new simplified rules from the original rule set effi...A novel DNA coding based knowledge discovery algorithm was proposed, an example which verified its validity was given. It is proved that this algorithm can discover new simplified rules from the original rule set efficiently.展开更多
It is common in industrial construction projects for data to be collected and discarded without being analyzed to extract useful knowledge. A proposed integrated methodology based on a five-step Knowledge Discovery in...It is common in industrial construction projects for data to be collected and discarded without being analyzed to extract useful knowledge. A proposed integrated methodology based on a five-step Knowledge Discovery in Data (KDD) model was developed to address this issue. The framework transfers existing multidimensional historical data from completed projects into useful knowledge for future projects. The model starts by understanding the problem domain, industrial construction projects. The second step is analyzing the problem data and its multiple dimensions. The target dataset is the labour resources data generated while managing industrial construction projects. The next step is developing the data collection model and prototype data ware-house. The data warehouse stores collected data in a ready-for-mining format and produces dynamic On Line Analytical Processing (OLAP) reports and graphs. Data was collected from a large western-Canadian structural steel fabricator to prove the applicability of the developed methodology. The proposed framework was applied to three different case studies to validate the applicability of the developed framework to real projects data.展开更多
This paper proposes an integrative framework for network-structured analytic network process (ANP) modeling. The underlying rationales include: 1) creating the measuring items for the complex decision problems;2) appl...This paper proposes an integrative framework for network-structured analytic network process (ANP) modeling. The underlying rationales include: 1) creating the measuring items for the complex decision problems;2) applying factor analysis to reduce the complex measuring items into fewer constructs;3) employing Bayesian network classifier technique to discover the causal directions among constructs;4) using partial least squares path modeling to test the causal relationships among the items-constructs. The proposed framework is implemented for knowledge discovery to a case of high-tech companies’ enterprise resource planning (ERP) benefits and satisfaction in Hsinchu Science Park,Taiwan. The results show that the proposed framework for ANP modeling can reach a satisfactory level of convergent reliability and validity. Based on the findings, pragmatic implications to the ERP venders are discussed. This study has shed new light on the long neglected, yet critical, issue on decision structures and knowledge discovery for ANP modeling.展开更多
基金supported by the National Natural Science Foundation of China(Grant Nos.92152102 and 92152202)the Advanced Jet Propulsion Innovation Center/AEAC(Grant No.HKCX2022-01-010)。
文摘Identification of underlying partial differential equations(PDEs)for complex systems remains a formidable challenge.In the present study,a robust PDE identification method is proposed,demonstrating the ability to extract accurate governing equations under noisy conditions without prior knowledge.Specifically,the proposed method combines gene expression programming,one type of evolutionary algorithm capable of generating unseen terms based solely on basic operators and functional terms,with symbolic regression neural networks.These networks are designed to represent explicit functional expressions and optimize them with data gradients.In particular,the specifically designed neural networks can be easily transformed to physical constraints for the training data,embedding the discovered PDEs to further optimize the metadata used for iterative PDE identification.The proposed method has been tested in four canonical PDE cases,validating its effectiveness without preliminary information and confirming its suitability for practical applications across various noise levels.
文摘To improve the performance of the multiple classifier system, a new method of feature-decision level fusion is proposed based on knowledge discovery. In the new method, the base classifiers operate on different feature spaces and their types depend on different measures of between-class separability. The uncertainty measures corresponding to each output of each base classifier are induced from the established decision tables (DTs) in the form of mass function in the Dempster-Shafer theory (DST). Furthermore, an effective fusion framework is built at the feature-decision level on the basis of a generalized rough set model and the DST. The experiment for the classification of hyperspectral remote sensing images shows that the performance of the classification can be improved by the proposed method compared with that of plurality voting (PV).
文摘To discover the knowledge of fault diagnosis in maintenance record of flexible manufacture system(FMS) equipment. An algorithm (process) was presented, which consists of ① preparatory phase in which some items in maintenance record are selected and decomposed into associated concepts and attributes, and ② discovering and establishing process, in which some possible relationships between the concepts and attributes can be established and knowledge is formulated. The rich diagnosis knowledge in maintenance record was captured through applying the method. An application of the method to the diagnosis system for FMS equipment showed that the approach is correct and effective.
基金[This work was financially supported by the National Natural Science Foundation of China (No. 69835001).]
文摘A new algorithm for the knowledge discovery based on statistic inductionlogic is proposed, and the validity of the methods is verified by examples. The method is suitablefor a large range of knowledge discovery applications in the studying of causal relation,uncertainty knowledge acquisition and principal factors analyzing. The language filed description ofthe state space makes the algorithm robust in the adaptation with easier understandable results,which are isomotopy with natural language in the topologic space.
文摘In the current biomedical data movement, numerous efforts have been made to convert and normalize a large number of traditional structured and unstructured data (e.g., EHRs, reports) to semi-structured data (e.g., RDF, OWL). With the increasing number of semi-structured data coming into the biomedical community, data integration and knowledge discovery from heterogeneous domains become important research problem. In the application level, detection of related concepts among medical ontologies is an important goal of life science research. It is more crucial to figure out how different concepts are related within a single ontology or across multiple ontologies by analysing predicates in different knowledge bases. However, the world today is one of information explosion, and it is extremely difficult for biomedical researchers to find existing or potential predicates to perform linking among cross domain concepts without any support from schema pattern analysis. Therefore, there is a need for a mechanism to do predicate oriented pattern analysis to partition heterogeneous ontologies into closer small topics and do query generation to discover cross domain knowledge from each topic. In this paper, we present such a model that predicates oriented pattern analysis based on their close relationship and generates a similarity matrix. Based on this similarity matrix, we apply an innovated unsupervised learning algorithm to partition large data sets into smaller and closer topics and generate meaningful queries to fully discover knowledge over a set of interlinked data sources. We have implemented a prototype system named BmQGen and evaluate the proposed model with colorectal surgical cohort from the Mayo Clinic.
文摘The 1st International Conference on Data-driven Knowledge Discovery: When Data Science Meets Information Science took place at the National Science Library (NSL), Chinese Academy of Sciences (CAS) in Beijing from June 19 till June 22, 2016. The Conference was opened by NSL Director Xiangyang Huang, who placed the event within the goals of the Library, and lauded the spirit of intemational collaboration in the area of data science and knowledge discovery. The whole event was an encouraging success with over 370 registered participants and highly enlightening presentations. The Conference was organized by the Journal of Data andlnformation Science (JDIS) to bring the Joumal to the attention of an international and local audience.
文摘Knowledge discovery, as an increasingly adopted information technology in biomedical science, has shown great promise in the field of Traditional Chinese Medicine (TCM). In this paper, we provided a kind of multidimensional table which was well suited for organizing and analyzing the data in ancient Chinese books on Materia Medica. Moreover, we demonstrated its capability of facilitating further mining works in TCM through two illustrative studies of discovering meaningful patterns in the three-dimensional table of Shennong’s Classic of Materia Medica. This work might provide an appropriate data model for the development of knowledge discovery in TCM.
文摘Purpose: This paper explores a method of knowledge discovery by visualizing and analyzing co-occurrence relations among three or more entities in collections of journal articles.Design/methodology/approach: A variety of methods such as the model construction,system analysis and experiments are used. The author has improved Morris' crossmapping technique and developed a technique for directly describing,visualizing and analyzing co-occurrence relations among three or more entities in collections of journal articles.Findings: The visualization tools and the knowledge discovery method can efficiently reveal the multiple co-occurrence relations among three entities in collections of journal papers. It can reveal more and in-depth information than analyzing co-occurrence relations between two entities. Therefore,this method can be used for mapping knowledge domain that is manifested in association with the entities from multi-dimensional perspectives and in an all-round way.Research limitations: The technique could only be used to analyze co-occurrence relations of less than three entities at present.Practical implications: This research has expanded the study scope of co-occurrence analysis.The research result has provided a theoretical support for co-occurrence analysis.Originality/value: There has not been a systematic study on co-occurrence relations among multiple entities in collections of journal articles. This research defines multiple co-occurrence and the research scope,develops the visualization analysis tool and designs the analysis model of the knowledge discovery method.
文摘An integrated solution for discovery of literature information knowledge is proposed. The analytic model of literature Information model and discovery of literature information knowledge are illustrated. Practical illustrative example for discovery of literature information knowledge is given.
文摘Structural choice is a significant decision having an important influence on structural function, social economics, structural reliability and construction cost. A Case Based Reasoning system with its retrieval part constructed with a KDD subsystem, is put forward to make a decision for a large scale engineering project. A typical CBR system consists of four parts: case representation, case retriever, evaluation, and adaptation. A case library is a set of parameterized excellent and successful structures. For a structural choice, the key point is that the system must be able to detect the pattern classes hidden in the case library and classify the input parameters into classes properly. That is done by using the KDD Data Mining algorithm based on Self Organizing Feature Maps (SOFM), which makes the whole system more adaptive, self organizing, self learning and open.
文摘There are both associations and differences between structured and unstructured data mining. How to unite them together to be a united theoretical framework and to guide the research of knowledge discovery and data mining has become an urgent problem to be solved. On the base of analysis and study of existing research results, the united model of knowledge discovery state space (UMKDSS) is presented, and the structured data mining and the complex type data mining are associated together. UMKDSS can provide theoretical guidance for complex type data mining. An application example of UMKDSS is given at last.
文摘LP (Logic Programming) has been successfully applied to knowledge discovery in many fields. The execution of the LP is based on the evaluation of the first order predicate. Usually the information involved in the predicates are local and homogenous, thus the evaluation process is relatively simple. However, the evaluation process become much more complicated when applied to KDD on the Internet where the information involved in the predicates maybe heterogeneous and distributed over many different sits. Therefor, we try to attack the problem in a multi agent system's framework so that the logic program can be written in a site independent style and deal easily with heterogeneous represented information.
基金Supported by National Institute of General Medical Sciences of the National Institutes of Health,No.R01GM100387
文摘AIM To develop a framework to incorporate background domain knowledge into classification rule learning for knowledge discovery in biomedicine.METHODS Bayesian rule learning(BRL) is a rule-based classifier that uses a greedy best-first search over a space of Bayesian belief-networks(BN) to find the optimal BN to explain the input dataset, and then infers classification rules from this BN. BRL uses a Bayesian score to evaluate the quality of BNs. In this paper, we extended the Bayesian score to include informative structure priors, which encodes our prior domain knowledge about the dataset. We call this extension of BRL as BRL_p. The structure prior has a λ hyperparameter that allows the user to tune the degree of incorporation of the prior knowledge in the model learning process. We studied the effect of λ on model learning using a simulated dataset and a real-world lung cancer prognostic biomarker dataset, by measuring the degree of incorporation of our specified prior knowledge. We also monitored its effect on the model predictive performance. Finally, we compared BRL_p to other stateof-the-art classifiers commonly used in biomedicine.RESULTS We evaluated the degree of incorporation of prior knowledge into BRL_p, with simulated data by measuring the Graph Edit Distance between the true datagenerating model and the model learned by BRL_p. We specified the true model using informative structurepriors. We observed that by increasing the value of λ we were able to increase the influence of the specified structure priors on model learning. A large value of λ of BRL_p caused it to return the true model. This also led to a gain in predictive performance measured by area under the receiver operator characteristic curve(AUC). We then obtained a publicly available real-world lung cancer prognostic biomarker dataset and specified a known biomarker from literature [the epidermal growth factor receptor(EGFR) gene]. We again observed that larger values of λ led to an increased incorporation of EGFR into the final BRL_p model. This relevant background knowledge also led to a gain in AUC.CONCLUSION BRL_p enables tunable structure priors to be incorporated during Bayesian classification rule learning that integrates data and knowledge as demonstrated using lung cancer biomarker data.
基金theChina’sNationalSurveyingTechnicalFund (No .2 0 0 0 7)
文摘This paper proposes the principle of comprehensive knowledge discovery. Unlike most of the current knowledge discovery methods, the comprehensive knowledge discovery considers both the spatial relations and attributes of spatial entities or objects. We introduce the theory of spatial knowledge expression system and some concepts including comprehensive knowledge discovery and spatial union information table (SUIT). In theory, SUIT records all information contained in the studied objects, but in reality, because of the complexity and varieties of spatial relations, only those factors of interest to us are selected. In order to find out the comprehensive knowledge from spatial databases, an efficient comprehensive knowledge discovery algorithm called recycled algorithm (RAR) is suggested.
文摘A new structure of ESKD (expert system based on knowledge discovery system KD (D&K)) is first presented on the basis of KD (D&K)-a synthesized knowledge discovery system based on double-base (database and knowledge base) cooperating mechanism. With all new features, ESKD may form a new research direction and provide a great probability for solving the wealth of knowledge in the knowledge base. The general structural frame of ESKD and some sub-systems among ESKD have been described, and the dynamic knowledge base based on double-base cooperating mechanism has been emphased on. According to the result of demonstrative experi- ment, the structure of ESKD is effective and feasible.
文摘The present article outlines progress made in designing an intelligent information system for automatic management and knowledge discovery in large numeric and scientific databases, with a validating application to the CAST-NEONS environmental databases used for ocean modeling and prediction. We describe a discovery-learning process (Automatic Data Analysis System) which combines the features of two machine learning techniques to generate sets of production rules that efficiently describe the observational raw data contained in the database. Data clustering allows the system to classify the raw data into meaningful conceptual clusters, which the system learns by induction to build decision trees, from which are automatically deduced the production rules.
文摘A novel DNA coding based knowledge discovery algorithm was proposed, an example which verified its validity was given. It is proved that this algorithm can discover new simplified rules from the original rule set efficiently.
文摘It is common in industrial construction projects for data to be collected and discarded without being analyzed to extract useful knowledge. A proposed integrated methodology based on a five-step Knowledge Discovery in Data (KDD) model was developed to address this issue. The framework transfers existing multidimensional historical data from completed projects into useful knowledge for future projects. The model starts by understanding the problem domain, industrial construction projects. The second step is analyzing the problem data and its multiple dimensions. The target dataset is the labour resources data generated while managing industrial construction projects. The next step is developing the data collection model and prototype data ware-house. The data warehouse stores collected data in a ready-for-mining format and produces dynamic On Line Analytical Processing (OLAP) reports and graphs. Data was collected from a large western-Canadian structural steel fabricator to prove the applicability of the developed methodology. The proposed framework was applied to three different case studies to validate the applicability of the developed framework to real projects data.
文摘This paper proposes an integrative framework for network-structured analytic network process (ANP) modeling. The underlying rationales include: 1) creating the measuring items for the complex decision problems;2) applying factor analysis to reduce the complex measuring items into fewer constructs;3) employing Bayesian network classifier technique to discover the causal directions among constructs;4) using partial least squares path modeling to test the causal relationships among the items-constructs. The proposed framework is implemented for knowledge discovery to a case of high-tech companies’ enterprise resource planning (ERP) benefits and satisfaction in Hsinchu Science Park,Taiwan. The results show that the proposed framework for ANP modeling can reach a satisfactory level of convergent reliability and validity. Based on the findings, pragmatic implications to the ERP venders are discussed. This study has shed new light on the long neglected, yet critical, issue on decision structures and knowledge discovery for ANP modeling.