Data production and exchange on the Web grows at a frenetic speed. Such uncontrolled and exponential growth pushes for new researches in the area of information extraction as it is of great interest and can be obtaine...Data production and exchange on the Web grows at a frenetic speed. Such uncontrolled and exponential growth pushes for new researches in the area of information extraction as it is of great interest and can be obtained by processing data gathered from several heterogeneous sources. While some extracted facts can be correct at the origin, it is not possible to verify that correlations among the mare always true (e.g., they can relate to different points of time). We need systems smart enough to separate signal from noise and hence extract real value from this abundance of content accessible on the Web. In order to extract information from heterogeneous sources, we are involved into the entire process of identifying specific facts/events of interest. We propose a gluing architecture, driving the whole knowledge acquisition process, from data acquisition from external heterogeneous resources to their exploitation for RDF trip lification to support reasoning tasks. Once the extraction process is completed, a dedicated reasoner can infer new knowledge as a result of the reasoning process defined by the end user by means of specific inference rules over both extracted information and the background knowledge. The end user is supported in this context with an intelligent interface allowing to visualize either specific data/concepts, or all information inferred by applying deductive reasoning over a collection of data.展开更多
With this work, we introduce a novel method for the unsupervised learning of conceptual hierarchies, or concept maps as they are sometimes called, which is aimed specifically for use with literary texts, as such disti...With this work, we introduce a novel method for the unsupervised learning of conceptual hierarchies, or concept maps as they are sometimes called, which is aimed specifically for use with literary texts, as such distinguishing itself from the majority of research literature on the topic which is primarily focused on building ontologies from a vast array of different types of data sources, both structured and unstructured, to support various forms of AI, in particular, the Semantic Web as envisioned by Tim Berners-Lee. We first elaborate on mutually informing disciplines of philosophy and computer science, or more specifically the relationship between metaphysics, epistemology, ontology, computing and AI, followed by a technically in-depth discussion of DEBRA, our dependency tree based concept hierarchy constructor, which as its name alludes to, constructs a conceptual map in the form of a directed graph which illustrates the concepts, their respective relations, and the implied ontological structure of the concepts as encoded in the text, decoded with standard Python NLP libraries such as spaCy and NLTK. With this work we hope to both augment the Knowledge Representation literature with opportunities for intellectual advancement in AI with more intuitive, less analytical, and well-known forms of knowledge representation from the cognitive science community, as well as open up new areas of research between Computer Science and the Humanities with respect to the application of the latest in NLP tools and techniques upon literature of cultural significance, shedding light on existing methods of computation with respect to documents in semantic space that effectively allows for, at the very least, the comparison and evolution of texts through time, using vector space math.展开更多
Root cause analysis (RCA) of abnormal aluminum electrolysis cell condition has long been a challenging industrial issue due to its inherent complexity in analyzing based on multi-source knowledge. In addition, accur...Root cause analysis (RCA) of abnormal aluminum electrolysis cell condition has long been a challenging industrial issue due to its inherent complexity in analyzing based on multi-source knowledge. In addition, accurate RCA of abnormal aluminum electrolysis cell condition is the precondition of improving current efficiency. RCA of abnormal condition is a complex work of multi-source knowledge fusion, which is difficult to ensure the RCA accuracy of abnormal cell condition because of dwindling and frequent flow of experienced technicians. In view of this, a method based on Fuzzy- Bayesian network to construct multi-source knowledge solidification reasoning model is proposed. The method can effectively fuse and solidify the knowledge, which is used to analyze the cause of abnormal condition by technicians providing a clear and intuitive framework to this complex task, and also achieve the result of root cause automatically. The proposed method was verified under 20 sets of abnormal cell conditions, and implements root cause analysis by finding the abnormal state of root node, which has a maximum posterior probability by Bayesian diagnosis reasoning. The accuracy of the test results is up to 95%, which shows that the knowledge reasoning feasibility for RCA of aluminum electrolysis cell.展开更多
Collecting massive commonsense knowledge (CSK) for commonsense reasoning has been a long time standing challenge within artificial intelligence research. Numerous methods and systems for acquiring CSK have been deve...Collecting massive commonsense knowledge (CSK) for commonsense reasoning has been a long time standing challenge within artificial intelligence research. Numerous methods and systems for acquiring CSK have been developed to overcome the knowledge acquisition bottleneck. Although some specific commonsense reasoning tasks have been presented to allow researchers to measure and compare the performance of their CSK systems, we compare them at a higher level from the following aspects: CSK acquisition task (what CSK is acquired from where), technique used (how can CSK be acquired), and CSK evaluation methods (how to evaluate the acquired CSK). In this survey, we first present a categorization of CSK acquisition systems and the great challenges in the field. Then, we review and compare the CSK acquisition systems in detail. Finally, we conclude the current progress in this field and explore some promising future research issues.展开更多
In this paper we study the solution of SAT problems formulated as discretedecision and discrete constrained optimization problems. Constrained formulations are better thantraditional unconstrained formulations because...In this paper we study the solution of SAT problems formulated as discretedecision and discrete constrained optimization problems. Constrained formulations are better thantraditional unconstrained formulations because violated constraints may provide additional forces tolead a search towards a satisfiable assignment. We summarize the theory of extended saddle pointsin penalty formulations for solving discrete constrained optimization problems and the associateddiscrete penalty method (DPM). We then examine various formulations of the objective function,choices of neighborhood in DPM, strategies for updating penalties, and heuristics for avoidingtraps. Experimental evaluations on hard benchmark instances pinpoint that traps contributesignificantly to the inefficiency of DPM and force a trajectory to repeatedly visit the same set ofor nearby points in the original variable space. To address this issue, we propose and study twotrap-avoidance strategies. The first strategy adds extra penalties on unsatisfied clauses inside atrap, leading to very large penalties for unsatisfied clauses that are trapped more often and makingthese clauses more likely to be satisfied in the future. The second strategy stores information onpoints visited before, whether inside traps or not, and avoids visiting points that are close topoints visited before. It can be implemented by modifying the penalty function in such a way that,if a trajectory gets close to points visited before, an extra penalty will take effect and force thetrajectory to a new region. It specializes to the first strategy because traps are special cases ofpoints visited before. Finally, we show experimental results on evaluating benchmarks in the DIMACSand SATLIB archives and compare our results with existing results on GSAT, WalkSAT, LSDL, andGrasp. The results demonstrate that DPM with trap avoidance is robust as well as effective forsolving hard SAT problems.展开更多
文摘Data production and exchange on the Web grows at a frenetic speed. Such uncontrolled and exponential growth pushes for new researches in the area of information extraction as it is of great interest and can be obtained by processing data gathered from several heterogeneous sources. While some extracted facts can be correct at the origin, it is not possible to verify that correlations among the mare always true (e.g., they can relate to different points of time). We need systems smart enough to separate signal from noise and hence extract real value from this abundance of content accessible on the Web. In order to extract information from heterogeneous sources, we are involved into the entire process of identifying specific facts/events of interest. We propose a gluing architecture, driving the whole knowledge acquisition process, from data acquisition from external heterogeneous resources to their exploitation for RDF trip lification to support reasoning tasks. Once the extraction process is completed, a dedicated reasoner can infer new knowledge as a result of the reasoning process defined by the end user by means of specific inference rules over both extracted information and the background knowledge. The end user is supported in this context with an intelligent interface allowing to visualize either specific data/concepts, or all information inferred by applying deductive reasoning over a collection of data.
文摘With this work, we introduce a novel method for the unsupervised learning of conceptual hierarchies, or concept maps as they are sometimes called, which is aimed specifically for use with literary texts, as such distinguishing itself from the majority of research literature on the topic which is primarily focused on building ontologies from a vast array of different types of data sources, both structured and unstructured, to support various forms of AI, in particular, the Semantic Web as envisioned by Tim Berners-Lee. We first elaborate on mutually informing disciplines of philosophy and computer science, or more specifically the relationship between metaphysics, epistemology, ontology, computing and AI, followed by a technically in-depth discussion of DEBRA, our dependency tree based concept hierarchy constructor, which as its name alludes to, constructs a conceptual map in the form of a directed graph which illustrates the concepts, their respective relations, and the implied ontological structure of the concepts as encoded in the text, decoded with standard Python NLP libraries such as spaCy and NLTK. With this work we hope to both augment the Knowledge Representation literature with opportunities for intellectual advancement in AI with more intuitive, less analytical, and well-known forms of knowledge representation from the cognitive science community, as well as open up new areas of research between Computer Science and the Humanities with respect to the application of the latest in NLP tools and techniques upon literature of cultural significance, shedding light on existing methods of computation with respect to documents in semantic space that effectively allows for, at the very least, the comparison and evolution of texts through time, using vector space math.
文摘Root cause analysis (RCA) of abnormal aluminum electrolysis cell condition has long been a challenging industrial issue due to its inherent complexity in analyzing based on multi-source knowledge. In addition, accurate RCA of abnormal aluminum electrolysis cell condition is the precondition of improving current efficiency. RCA of abnormal condition is a complex work of multi-source knowledge fusion, which is difficult to ensure the RCA accuracy of abnormal cell condition because of dwindling and frequent flow of experienced technicians. In view of this, a method based on Fuzzy- Bayesian network to construct multi-source knowledge solidification reasoning model is proposed. The method can effectively fuse and solidify the knowledge, which is used to analyze the cause of abnormal condition by technicians providing a clear and intuitive framework to this complex task, and also achieve the result of root cause automatically. The proposed method was verified under 20 sets of abnormal cell conditions, and implements root cause analysis by finding the abnormal state of root node, which has a maximum posterior probability by Bayesian diagnosis reasoning. The accuracy of the test results is up to 95%, which shows that the knowledge reasoning feasibility for RCA of aluminum electrolysis cell.
基金supported by the National Natural Science Foundation of China under Grant Nos.91224006,61173063,61035004,61203284,and 309737163the National Social Science Foundation of China under Grant No.10AYY003
文摘Collecting massive commonsense knowledge (CSK) for commonsense reasoning has been a long time standing challenge within artificial intelligence research. Numerous methods and systems for acquiring CSK have been developed to overcome the knowledge acquisition bottleneck. Although some specific commonsense reasoning tasks have been presented to allow researchers to measure and compare the performance of their CSK systems, we compare them at a higher level from the following aspects: CSK acquisition task (what CSK is acquired from where), technique used (how can CSK be acquired), and CSK evaluation methods (how to evaluate the acquired CSK). In this survey, we first present a categorization of CSK acquisition systems and the great challenges in the field. Then, we review and compare the CSK acquisition systems in detail. Finally, we conclude the current progress in this field and explore some promising future research issues.
文摘In this paper we study the solution of SAT problems formulated as discretedecision and discrete constrained optimization problems. Constrained formulations are better thantraditional unconstrained formulations because violated constraints may provide additional forces tolead a search towards a satisfiable assignment. We summarize the theory of extended saddle pointsin penalty formulations for solving discrete constrained optimization problems and the associateddiscrete penalty method (DPM). We then examine various formulations of the objective function,choices of neighborhood in DPM, strategies for updating penalties, and heuristics for avoidingtraps. Experimental evaluations on hard benchmark instances pinpoint that traps contributesignificantly to the inefficiency of DPM and force a trajectory to repeatedly visit the same set ofor nearby points in the original variable space. To address this issue, we propose and study twotrap-avoidance strategies. The first strategy adds extra penalties on unsatisfied clauses inside atrap, leading to very large penalties for unsatisfied clauses that are trapped more often and makingthese clauses more likely to be satisfied in the future. The second strategy stores information onpoints visited before, whether inside traps or not, and avoids visiting points that are close topoints visited before. It can be implemented by modifying the penalty function in such a way that,if a trajectory gets close to points visited before, an extra penalty will take effect and force thetrajectory to a new region. It specializes to the first strategy because traps are special cases ofpoints visited before. Finally, we show experimental results on evaluating benchmarks in the DIMACSand SATLIB archives and compare our results with existing results on GSAT, WalkSAT, LSDL, andGrasp. The results demonstrate that DPM with trap avoidance is robust as well as effective forsolving hard SAT problems.