ing; automatic knowledge acquisition; machine learning; natural language processing Abstract One of the most important signs of the information society is the explosion of information. The information in Internet is ...ing; automatic knowledge acquisition; machine learning; natural language processing Abstract One of the most important signs of the information society is the explosion of information. The information in Internet is out of order and is mostly written in natural languages which need to be processed by the technology of natural language processing. When you search for some certain information on Internet through a search engine, you might be confused by the huge amount of results which the search engine provides. However, if a search engine is embedded with Automatic Abstracting (AA) processing systems, you could locate the information quickly or you could get more information within a limited time. So, the AA technology is valuable both in science and application. The work of this thesis was begun when we took over a project that is called 'The Key Technology Research of Computer Networks Providing Intelligent Information Services' which belongs to the national 863 plan. One of the tasks is 'The Key Technology Research of Automatic Abstracting Systems of Chinese Text'. As a member of this research group, I took part in designing and implementing an AA system called Literature Abstract and Digest Information Extract System(LADIES). From then on, I have been working in this field and this paper is the conclusion of my work. The main topic of the thesis is AA technology. There are two parts of it. One is about the research of understanding based AA systems, and the other is about the invcestigation of Automatic Knowledge Acquistion(AKA) in AA systems. In the first part, the contents of AA technology are introduced and an understanding based AA model is put forward. Based on this model, LADIES is implemented. There are two major features of LADIES: (1) it understands text with the grammar, semantic and pragmatic information of words; (2) it chunks words into a relatively independent entity with chunking rules which are substitutes of syntactic analyzing rules. The results demonstrate that it performs better than those statistical based AA systems. However, the application of LADIES is limited for its knowledge bases. And it is difficult to use in other fields because the knowledge bases are setup manually. So we investigate the techniques of automatic knowledge acquisition in order to solve the above problems to some extent. In the second part, we introduce the basic ideas of AKA and some Machine Learning (ML) methods which AKA applies. Then we propose a comprehensive dictionary model that contains grammar, semantic and pragmatic information of words. And we investigate a strategy of automatic learning pragmatic information for words. Also we put forward another strategy of automatic learning rule of salience sentences in texts and based on it, we establish an AA system LADIES NEW. Eventually, we suggest a AKA based AA system model called hierarchical feature extracting AA system model.展开更多
The quality, quantity, and consistency of the knowledge used in GO-playing programs often determine their strengths, and automatic acquisition of large amounts of high-quality and consistent GO knowledge is crucial fo...The quality, quantity, and consistency of the knowledge used in GO-playing programs often determine their strengths, and automatic acquisition of large amounts of high-quality and consistent GO knowledge is crucial for successful GO playing. In a previous article of this subject, we have presented an algorithm for efficient and automatic acquisition of spatial patterns of GO as well as their frequency of occurrence from game records. In this article, we present two algorithms, one for efficient and automatic acquisition of pairs of spatial patterns that appear jointly in a local context, and the other for deter- mining whether the joint pattern appearances are of certain significance statistically and not just a coincidence. Results of the two algorithms include 1 779 966 pairs of spatial patterns acquired automatically from 16 067 game records of professsional GO players, of which about 99.8% are qualified as pattern collocations with a statistical confidence of 99.5% or higher.展开更多
Computer programs of GO are typically constructed using a knowledge-based approach with heuristics and pattern matching because of enormous complexities of the game. In this approach, quantity, quality, and consistenc...Computer programs of GO are typically constructed using a knowledge-based approach with heuristics and pattern matching because of enormous complexities of the game. In this approach, quantity, quality, and consistency of patterns used in computer programs of GO to a large extent determine the strengths of the programs. This study presents an effective method to acquire automatically comprehensive GO patterns from large collections of game records. Statistical usages of the patterns ensure consistency and quality of the patterns, which in turn can help improve the strengths of computer GO programs. Additionally, statistical usages of patterns from different sources of game records clearly show subtle and significant discrepancies among various types of GO players, and clarify certain myths in the playing of GO.展开更多
文摘ing; automatic knowledge acquisition; machine learning; natural language processing Abstract One of the most important signs of the information society is the explosion of information. The information in Internet is out of order and is mostly written in natural languages which need to be processed by the technology of natural language processing. When you search for some certain information on Internet through a search engine, you might be confused by the huge amount of results which the search engine provides. However, if a search engine is embedded with Automatic Abstracting (AA) processing systems, you could locate the information quickly or you could get more information within a limited time. So, the AA technology is valuable both in science and application. The work of this thesis was begun when we took over a project that is called 'The Key Technology Research of Computer Networks Providing Intelligent Information Services' which belongs to the national 863 plan. One of the tasks is 'The Key Technology Research of Automatic Abstracting Systems of Chinese Text'. As a member of this research group, I took part in designing and implementing an AA system called Literature Abstract and Digest Information Extract System(LADIES). From then on, I have been working in this field and this paper is the conclusion of my work. The main topic of the thesis is AA technology. There are two parts of it. One is about the research of understanding based AA systems, and the other is about the invcestigation of Automatic Knowledge Acquistion(AKA) in AA systems. In the first part, the contents of AA technology are introduced and an understanding based AA model is put forward. Based on this model, LADIES is implemented. There are two major features of LADIES: (1) it understands text with the grammar, semantic and pragmatic information of words; (2) it chunks words into a relatively independent entity with chunking rules which are substitutes of syntactic analyzing rules. The results demonstrate that it performs better than those statistical based AA systems. However, the application of LADIES is limited for its knowledge bases. And it is difficult to use in other fields because the knowledge bases are setup manually. So we investigate the techniques of automatic knowledge acquisition in order to solve the above problems to some extent. In the second part, we introduce the basic ideas of AKA and some Machine Learning (ML) methods which AKA applies. Then we propose a comprehensive dictionary model that contains grammar, semantic and pragmatic information of words. And we investigate a strategy of automatic learning pragmatic information for words. Also we put forward another strategy of automatic learning rule of salience sentences in texts and based on it, we establish an AA system LADIES NEW. Eventually, we suggest a AKA based AA system model called hierarchical feature extracting AA system model.
文摘The quality, quantity, and consistency of the knowledge used in GO-playing programs often determine their strengths, and automatic acquisition of large amounts of high-quality and consistent GO knowledge is crucial for successful GO playing. In a previous article of this subject, we have presented an algorithm for efficient and automatic acquisition of spatial patterns of GO as well as their frequency of occurrence from game records. In this article, we present two algorithms, one for efficient and automatic acquisition of pairs of spatial patterns that appear jointly in a local context, and the other for deter- mining whether the joint pattern appearances are of certain significance statistically and not just a coincidence. Results of the two algorithms include 1 779 966 pairs of spatial patterns acquired automatically from 16 067 game records of professsional GO players, of which about 99.8% are qualified as pattern collocations with a statistical confidence of 99.5% or higher.
文摘Computer programs of GO are typically constructed using a knowledge-based approach with heuristics and pattern matching because of enormous complexities of the game. In this approach, quantity, quality, and consistency of patterns used in computer programs of GO to a large extent determine the strengths of the programs. This study presents an effective method to acquire automatically comprehensive GO patterns from large collections of game records. Statistical usages of the patterns ensure consistency and quality of the patterns, which in turn can help improve the strengths of computer GO programs. Additionally, statistical usages of patterns from different sources of game records clearly show subtle and significant discrepancies among various types of GO players, and clarify certain myths in the playing of GO.