ing; automatic knowledge acquisition; machine learning; natural language processing Abstract One of the most important signs of the information society is the explosion of information. The information in Internet is ...ing; automatic knowledge acquisition; machine learning; natural language processing Abstract One of the most important signs of the information society is the explosion of information. The information in Internet is out of order and is mostly written in natural languages which need to be processed by the technology of natural language processing. When you search for some certain information on Internet through a search engine, you might be confused by the huge amount of results which the search engine provides. However, if a search engine is embedded with Automatic Abstracting (AA) processing systems, you could locate the information quickly or you could get more information within a limited time. So, the AA technology is valuable both in science and application. The work of this thesis was begun when we took over a project that is called 'The Key Technology Research of Computer Networks Providing Intelligent Information Services' which belongs to the national 863 plan. One of the tasks is 'The Key Technology Research of Automatic Abstracting Systems of Chinese Text'. As a member of this research group, I took part in designing and implementing an AA system called Literature Abstract and Digest Information Extract System(LADIES). From then on, I have been working in this field and this paper is the conclusion of my work. The main topic of the thesis is AA technology. There are two parts of it. One is about the research of understanding based AA systems, and the other is about the invcestigation of Automatic Knowledge Acquistion(AKA) in AA systems. In the first part, the contents of AA technology are introduced and an understanding based AA model is put forward. Based on this model, LADIES is implemented. There are two major features of LADIES: (1) it understands text with the grammar, semantic and pragmatic information of words; (2) it chunks words into a relatively independent entity with chunking rules which are substitutes of syntactic analyzing rules. The results demonstrate that it performs better than those statistical based AA systems. However, the application of LADIES is limited for its knowledge bases. And it is difficult to use in other fields because the knowledge bases are setup manually. So we investigate the techniques of automatic knowledge acquisition in order to solve the above problems to some extent. In the second part, we introduce the basic ideas of AKA and some Machine Learning (ML) methods which AKA applies. Then we propose a comprehensive dictionary model that contains grammar, semantic and pragmatic information of words. And we investigate a strategy of automatic learning pragmatic information for words. Also we put forward another strategy of automatic learning rule of salience sentences in texts and based on it, we establish an AA system LADIES NEW. Eventually, we suggest a AKA based AA system model called hierarchical feature extracting AA system model.展开更多
With this work, we introduce a novel method for the unsupervised learning of conceptual hierarchies, or concept maps as they are sometimes called, which is aimed specifically for use with literary texts, as such disti...With this work, we introduce a novel method for the unsupervised learning of conceptual hierarchies, or concept maps as they are sometimes called, which is aimed specifically for use with literary texts, as such distinguishing itself from the majority of research literature on the topic which is primarily focused on building ontologies from a vast array of different types of data sources, both structured and unstructured, to support various forms of AI, in particular, the Semantic Web as envisioned by Tim Berners-Lee. We first elaborate on mutually informing disciplines of philosophy and computer science, or more specifically the relationship between metaphysics, epistemology, ontology, computing and AI, followed by a technically in-depth discussion of DEBRA, our dependency tree based concept hierarchy constructor, which as its name alludes to, constructs a conceptual map in the form of a directed graph which illustrates the concepts, their respective relations, and the implied ontological structure of the concepts as encoded in the text, decoded with standard Python NLP libraries such as spaCy and NLTK. With this work we hope to both augment the Knowledge Representation literature with opportunities for intellectual advancement in AI with more intuitive, less analytical, and well-known forms of knowledge representation from the cognitive science community, as well as open up new areas of research between Computer Science and the Humanities with respect to the application of the latest in NLP tools and techniques upon literature of cultural significance, shedding light on existing methods of computation with respect to documents in semantic space that effectively allows for, at the very least, the comparison and evolution of texts through time, using vector space math.展开更多
Data mining (also known as Knowledge Discovery in Databases - KDD) is defined as the nontrivial extraction of implicit, previously unknown, and potentially useful information from data. The aims and objectives of data...Data mining (also known as Knowledge Discovery in Databases - KDD) is defined as the nontrivial extraction of implicit, previously unknown, and potentially useful information from data. The aims and objectives of data mining are to discover knowledge of interest to user needs.Data mining is really a useful tool in many domains such as marketing, decision making, etc. However, some basic issues of data mining are ignored. What is data mining? What is the product of a data mining process? What are we doing in a data mining process? Is there any rule we should obey in a data mining process? In order to discover patterns and knowledge really interesting and actionable to the real world Zhang et al proposed a domain-driven human-machine-cooperated data mining process.Zhao and Yao proposed an interactive user-driven classification method using the granule network. In our work, we find that data mining is a kind of knowledge transforming process to transform knowledge from data format into symbol format. Thus, no new knowledge could be generated (born) in a data mining process. In a data mining process, knowledge is just transformed from data format, which is not understandable for human, into symbol format,which is understandable for human and easy to be used.It is similar to the process of translating a book from Chinese into English.In this translating process,the knowledge itself in the book should remain unchanged. What will be changed is the format of the knowledge only. That is, the knowledge in the English book should be kept the same as the knowledge in the Chinese one.Otherwise, there must be some mistakes in the translating proces, that is, we are transforming knowledge from one format into another format while not producing new knowledge in a data mining process. The knowledge is originally stored in data (data is a representation format of knowledge). Unfortunately, we can not read, understand, or use it, since we can not understand data. With this understanding of data mining, we proposed a data-driven knowledge acquisition method based on rough sets. It also improved the performance of classical knowledge acquisition methods. In fact, we also find that the domain-driven data mining and user-driven data mining do not conflict with our data-driven data mining. They could be integrated into domain-oriented data-driven data mining. It is just like the views of data base. Users with different views could look at different partial data of a data base. Thus, users with different tasks or objectives wish, or could discover different knowledge (partial knowledge) from the same data base. However, all these partial knowledge should be originally existed in the data base. So, a domain-oriented data-driven data mining method would help us to extract the knowledge which is really existed in a data base, and really interesting and actionable to the real world.展开更多
The historical records of mechanical fault contain great amount of important information which is useful to identify the similar fault.The current fault diagnosis methods using historical records are inefficient to de...The historical records of mechanical fault contain great amount of important information which is useful to identify the similar fault.The current fault diagnosis methods using historical records are inefficient to deal with intuitive application and multicomponent multiphase fault diagnosis.Towards the problem,the rapid and intelligent fault diagnosis method based on system-phenomenon-fault (SPF) tree is proposed.The method begins with the physical system of the fault system,conceives the fault causes as leaves,the fault causes as leaves and the frequentness of fault as the interrelationship,and finally forms the fault tree with structural relationship of administrative subordination and flexible multi-granularity components.Firstly,the forming method of SPF tree is discussed;Secondly some basic definitions as synonymous branch,the tough degree of the branch,the dominant leaf,and the virtual branch are defined;and then,the performances including the merger of the dominant branches keeping dominant,the merger of the synonymous branches keeping dominant were proved.Furthermore,the merging,optimizing and calculating of virtual branch of SPF tree are proposed,the self-learning mechanism including the procedure and the related parameter calculation is presented,and the fault searching method and main fault statistics calculation are also presented based on SPF tree.Finally,the method is applied in the fault diagnosis of the certain type of embedded terminal to demonstrate fault information searching in the condition of the synonymous branch,the virtual branch merging and visual presentation of search results.The application shows that the proposed method is effective to narrow down the scope of searching fault and reduce the difficulty of computing.The proposed method is a new way to resolve the intelligent fault diagnosis problem of complex systems by organizing the disordering fault records and providing intuitive expression and intelligent computing capabilities.展开更多
文摘ing; automatic knowledge acquisition; machine learning; natural language processing Abstract One of the most important signs of the information society is the explosion of information. The information in Internet is out of order and is mostly written in natural languages which need to be processed by the technology of natural language processing. When you search for some certain information on Internet through a search engine, you might be confused by the huge amount of results which the search engine provides. However, if a search engine is embedded with Automatic Abstracting (AA) processing systems, you could locate the information quickly or you could get more information within a limited time. So, the AA technology is valuable both in science and application. The work of this thesis was begun when we took over a project that is called 'The Key Technology Research of Computer Networks Providing Intelligent Information Services' which belongs to the national 863 plan. One of the tasks is 'The Key Technology Research of Automatic Abstracting Systems of Chinese Text'. As a member of this research group, I took part in designing and implementing an AA system called Literature Abstract and Digest Information Extract System(LADIES). From then on, I have been working in this field and this paper is the conclusion of my work. The main topic of the thesis is AA technology. There are two parts of it. One is about the research of understanding based AA systems, and the other is about the invcestigation of Automatic Knowledge Acquistion(AKA) in AA systems. In the first part, the contents of AA technology are introduced and an understanding based AA model is put forward. Based on this model, LADIES is implemented. There are two major features of LADIES: (1) it understands text with the grammar, semantic and pragmatic information of words; (2) it chunks words into a relatively independent entity with chunking rules which are substitutes of syntactic analyzing rules. The results demonstrate that it performs better than those statistical based AA systems. However, the application of LADIES is limited for its knowledge bases. And it is difficult to use in other fields because the knowledge bases are setup manually. So we investigate the techniques of automatic knowledge acquisition in order to solve the above problems to some extent. In the second part, we introduce the basic ideas of AKA and some Machine Learning (ML) methods which AKA applies. Then we propose a comprehensive dictionary model that contains grammar, semantic and pragmatic information of words. And we investigate a strategy of automatic learning pragmatic information for words. Also we put forward another strategy of automatic learning rule of salience sentences in texts and based on it, we establish an AA system LADIES NEW. Eventually, we suggest a AKA based AA system model called hierarchical feature extracting AA system model.
文摘With this work, we introduce a novel method for the unsupervised learning of conceptual hierarchies, or concept maps as they are sometimes called, which is aimed specifically for use with literary texts, as such distinguishing itself from the majority of research literature on the topic which is primarily focused on building ontologies from a vast array of different types of data sources, both structured and unstructured, to support various forms of AI, in particular, the Semantic Web as envisioned by Tim Berners-Lee. We first elaborate on mutually informing disciplines of philosophy and computer science, or more specifically the relationship between metaphysics, epistemology, ontology, computing and AI, followed by a technically in-depth discussion of DEBRA, our dependency tree based concept hierarchy constructor, which as its name alludes to, constructs a conceptual map in the form of a directed graph which illustrates the concepts, their respective relations, and the implied ontological structure of the concepts as encoded in the text, decoded with standard Python NLP libraries such as spaCy and NLTK. With this work we hope to both augment the Knowledge Representation literature with opportunities for intellectual advancement in AI with more intuitive, less analytical, and well-known forms of knowledge representation from the cognitive science community, as well as open up new areas of research between Computer Science and the Humanities with respect to the application of the latest in NLP tools and techniques upon literature of cultural significance, shedding light on existing methods of computation with respect to documents in semantic space that effectively allows for, at the very least, the comparison and evolution of texts through time, using vector space math.
文摘Data mining (also known as Knowledge Discovery in Databases - KDD) is defined as the nontrivial extraction of implicit, previously unknown, and potentially useful information from data. The aims and objectives of data mining are to discover knowledge of interest to user needs.Data mining is really a useful tool in many domains such as marketing, decision making, etc. However, some basic issues of data mining are ignored. What is data mining? What is the product of a data mining process? What are we doing in a data mining process? Is there any rule we should obey in a data mining process? In order to discover patterns and knowledge really interesting and actionable to the real world Zhang et al proposed a domain-driven human-machine-cooperated data mining process.Zhao and Yao proposed an interactive user-driven classification method using the granule network. In our work, we find that data mining is a kind of knowledge transforming process to transform knowledge from data format into symbol format. Thus, no new knowledge could be generated (born) in a data mining process. In a data mining process, knowledge is just transformed from data format, which is not understandable for human, into symbol format,which is understandable for human and easy to be used.It is similar to the process of translating a book from Chinese into English.In this translating process,the knowledge itself in the book should remain unchanged. What will be changed is the format of the knowledge only. That is, the knowledge in the English book should be kept the same as the knowledge in the Chinese one.Otherwise, there must be some mistakes in the translating proces, that is, we are transforming knowledge from one format into another format while not producing new knowledge in a data mining process. The knowledge is originally stored in data (data is a representation format of knowledge). Unfortunately, we can not read, understand, or use it, since we can not understand data. With this understanding of data mining, we proposed a data-driven knowledge acquisition method based on rough sets. It also improved the performance of classical knowledge acquisition methods. In fact, we also find that the domain-driven data mining and user-driven data mining do not conflict with our data-driven data mining. They could be integrated into domain-oriented data-driven data mining. It is just like the views of data base. Users with different views could look at different partial data of a data base. Thus, users with different tasks or objectives wish, or could discover different knowledge (partial knowledge) from the same data base. However, all these partial knowledge should be originally existed in the data base. So, a domain-oriented data-driven data mining method would help us to extract the knowledge which is really existed in a data base, and really interesting and actionable to the real world.
基金supported by National Hi-tech Research and Development Program of China (863 key Program,Grant No.2007AA040701)Chongqing Municipal Natural Science Foundation Project of China (Grant No. CSTC2010BB4295)+2 种基金Research Fund for the Doctoral Program of Higher Education of China (Grant No.20100191120004)Fundamental Research Funds for the Central Universities of China (Grant No. CDJXS11111136)Research Foundation of Chongqing University of Science and Technology,China(Grant No. CK2010Z10)
文摘The historical records of mechanical fault contain great amount of important information which is useful to identify the similar fault.The current fault diagnosis methods using historical records are inefficient to deal with intuitive application and multicomponent multiphase fault diagnosis.Towards the problem,the rapid and intelligent fault diagnosis method based on system-phenomenon-fault (SPF) tree is proposed.The method begins with the physical system of the fault system,conceives the fault causes as leaves,the fault causes as leaves and the frequentness of fault as the interrelationship,and finally forms the fault tree with structural relationship of administrative subordination and flexible multi-granularity components.Firstly,the forming method of SPF tree is discussed;Secondly some basic definitions as synonymous branch,the tough degree of the branch,the dominant leaf,and the virtual branch are defined;and then,the performances including the merger of the dominant branches keeping dominant,the merger of the synonymous branches keeping dominant were proved.Furthermore,the merging,optimizing and calculating of virtual branch of SPF tree are proposed,the self-learning mechanism including the procedure and the related parameter calculation is presented,and the fault searching method and main fault statistics calculation are also presented based on SPF tree.Finally,the method is applied in the fault diagnosis of the certain type of embedded terminal to demonstrate fault information searching in the condition of the synonymous branch,the virtual branch merging and visual presentation of search results.The application shows that the proposed method is effective to narrow down the scope of searching fault and reduce the difficulty of computing.The proposed method is a new way to resolve the intelligent fault diagnosis problem of complex systems by organizing the disordering fault records and providing intuitive expression and intelligent computing capabilities.