This paper proposes a new way to improve the performance of dependency parser: subdividing verbs according to their grammatical functions and integrating the information of verb subclasses into lexicalized parsing mod...This paper proposes a new way to improve the performance of dependency parser: subdividing verbs according to their grammatical functions and integrating the information of verb subclasses into lexicalized parsing model. Firstly,the scheme of verb subdivision is described. Secondly,a maximum entropy model is presented to distinguish verb subclasses. Finally,a statistical parser is developed to evaluate the verb subdivision. Experimental results indicate that the use of verb subclasses has a good influence on parsing performance.展开更多
Word Sense Disambiguation (WSD) is to decide the sense of an ambiguous word on particular context. Most of current studies on WSD only use several ambiguous words as test samples, thus leads to some limitation in prac...Word Sense Disambiguation (WSD) is to decide the sense of an ambiguous word on particular context. Most of current studies on WSD only use several ambiguous words as test samples, thus leads to some limitation in practical application. In this paper, we perform WSD study based on large scale real-world corpus using two unsupervised learning algorithms based on ±n-improved Bayesian model and Dependency Grammar (DG)-improved Bayesian model. ±n-improved classifiers reduce the window size of context of ambiguous words with close-distance feature extraction method, and decrease the jamming of useless features, thus obviously improve the accuracy, reaching 83.18% (in open test). DG-improved classifier can more effectively conquer the noise effect existing in Naive-Bayesian classifier. Experimental results show that this approach does better on Chinese WSD, and the open test achieved an accuracy of 86.27%.展开更多
We presented a novel framework for automatic behavior clustering and unsupervised anomaly detection in a large video set. The framework consisted of the following key components: 1 ) Drawing from natural language pr...We presented a novel framework for automatic behavior clustering and unsupervised anomaly detection in a large video set. The framework consisted of the following key components: 1 ) Drawing from natural language processing, we introduced a compact and effective behavior representation method as a stochastic sequence of spatiotemporal events, where we analyzed the global structural information of behaviors using their local action statistics. 2) The natural grouping of behavior patterns was discovered through a novel clustering algorithm. 3 ) A run-time accumulative anomaly measure was introduced to detect abnormal behavior, whereas normal behavior patterns were recognized when sufficient visual evidence had become available based on an online Likelihood Ratio Test (LRT) method. This ensured robust and reliable anomaly detection and normal behavior recognition at the shortest possible time. Experimental results demonstrated the effectiveness and robustness of our approach using noisy and sparse data sets collected from a real surveillance scenario.展开更多
In this research paper, we research on the automatic pattern abstraction and recognition method for large-scale database system based on natural language processing. In distributed database, through the network connec...In this research paper, we research on the automatic pattern abstraction and recognition method for large-scale database system based on natural language processing. In distributed database, through the network connection between nodes, data across different nodes and even regional distribution are well recognized. In order to reduce data redundancy and model design of the database will usually contain a lot of forms we combine the NLP theory to optimize the traditional method. The experimental analysis and simulation proves the correctness of our method.展开更多
Sample entropy can reflect the change of level of new information in signal sequence as well as the size of the new information. Based on the sample entropy as the features of speech classification, the paper firstly ...Sample entropy can reflect the change of level of new information in signal sequence as well as the size of the new information. Based on the sample entropy as the features of speech classification, the paper firstly extract the sample entropy of mixed signal, mean and variance to calculate each signal sample entropy, finally uses the K mean clustering to recognize. The simulation results show that: the recognition rate can be increased to 89.2% based on sample entropy.展开更多
In this paper, the authors are presenting the approach to extract the multiword expression (MWEs) from monolingual corpora. It both validates and generates multiword candidates. The multiword expression provides a l...In this paper, the authors are presenting the approach to extract the multiword expression (MWEs) from monolingual corpora. It both validates and generates multiword candidates. The multiword expression provides a list of candidates which are extracted and filtered according to the number of criteria and a set of standard statistical association measures. The generation of the multiword candidates is based on the surface forms, while the validation consists of series of criteria for removing noise using language independent association measures. For generating corpus count, it provides both a corpus indexation facility. Also, this approach allows easy integration with a machine learning tool for thecreation and application of supervised multiword extraction models if annotated data is available. The authors present the use of multiword in a standard configuration, for extracting MWEs from a corpus of general purpose English.展开更多
Difficulty discrimination is an important step in autonomous design and interpreting teaching materials development, which is related to scientifi c nature of the materials, teaching effectiveness, and sequential teac...Difficulty discrimination is an important step in autonomous design and interpreting teaching materials development, which is related to scientifi c nature of the materials, teaching effectiveness, and sequential teaching progress. In this paper, we focus on the diffi culty discrimination of interpretation teaching materials on the basis of analytic hierarchy process and natural language processing. We analyze several factors which affect interpretation teaching materials, and we introduce theories of analytic hierarchy process and natural language processing which is intuitive and credible operation basis.展开更多
Automatic web image annotation is a practical and effective way for both web image retrieval and image understanding. However, current annotation techniques make no further investigation of the statement-level syntact...Automatic web image annotation is a practical and effective way for both web image retrieval and image understanding. However, current annotation techniques make no further investigation of the statement-level syntactic correlation among the annotated words, therefore making it very difficult to render natural language interpretation for images such as "pandas eat bamboo". In this paper, we propose an approach to interpret image semantics through mining the visible and textual information hidden in images. This approach mainly consists of two parts: first the annotated words of target images are ranked according to two factors, namely the visual correlation and the pairwise co-occurrence; then the statement-level syntactic correlation among annotated words is explored and natural language interpretation for the target image is obtained. Experiments conducted on real-world web images show the effectiveness of the proposed approach.展开更多
基金the National Natural Science Foundation of China (No.60435020, 60575042 and 60503072).
文摘This paper proposes a new way to improve the performance of dependency parser: subdividing verbs according to their grammatical functions and integrating the information of verb subclasses into lexicalized parsing model. Firstly,the scheme of verb subdivision is described. Secondly,a maximum entropy model is presented to distinguish verb subclasses. Finally,a statistical parser is developed to evaluate the verb subdivision. Experimental results indicate that the use of verb subclasses has a good influence on parsing performance.
基金Supported by the National Natural Science Foundation of China (No.60435020).
文摘Word Sense Disambiguation (WSD) is to decide the sense of an ambiguous word on particular context. Most of current studies on WSD only use several ambiguous words as test samples, thus leads to some limitation in practical application. In this paper, we perform WSD study based on large scale real-world corpus using two unsupervised learning algorithms based on ±n-improved Bayesian model and Dependency Grammar (DG)-improved Bayesian model. ±n-improved classifiers reduce the window size of context of ambiguous words with close-distance feature extraction method, and decrease the jamming of useless features, thus obviously improve the accuracy, reaching 83.18% (in open test). DG-improved classifier can more effectively conquer the noise effect existing in Naive-Bayesian classifier. Experimental results show that this approach does better on Chinese WSD, and the open test achieved an accuracy of 86.27%.
基金This work is supported by National Natural Science Foundation of China (NSFC) under Grant No. 60573139 andNational Science & Technology Pillar Program of China under Grant NO. 2008BAH221303.
文摘We presented a novel framework for automatic behavior clustering and unsupervised anomaly detection in a large video set. The framework consisted of the following key components: 1 ) Drawing from natural language processing, we introduced a compact and effective behavior representation method as a stochastic sequence of spatiotemporal events, where we analyzed the global structural information of behaviors using their local action statistics. 2) The natural grouping of behavior patterns was discovered through a novel clustering algorithm. 3 ) A run-time accumulative anomaly measure was introduced to detect abnormal behavior, whereas normal behavior patterns were recognized when sufficient visual evidence had become available based on an online Likelihood Ratio Test (LRT) method. This ensured robust and reliable anomaly detection and normal behavior recognition at the shortest possible time. Experimental results demonstrated the effectiveness and robustness of our approach using noisy and sparse data sets collected from a real surveillance scenario.
文摘In this research paper, we research on the automatic pattern abstraction and recognition method for large-scale database system based on natural language processing. In distributed database, through the network connection between nodes, data across different nodes and even regional distribution are well recognized. In order to reduce data redundancy and model design of the database will usually contain a lot of forms we combine the NLP theory to optimize the traditional method. The experimental analysis and simulation proves the correctness of our method.
文摘Sample entropy can reflect the change of level of new information in signal sequence as well as the size of the new information. Based on the sample entropy as the features of speech classification, the paper firstly extract the sample entropy of mixed signal, mean and variance to calculate each signal sample entropy, finally uses the K mean clustering to recognize. The simulation results show that: the recognition rate can be increased to 89.2% based on sample entropy.
文摘In this paper, the authors are presenting the approach to extract the multiword expression (MWEs) from monolingual corpora. It both validates and generates multiword candidates. The multiword expression provides a list of candidates which are extracted and filtered according to the number of criteria and a set of standard statistical association measures. The generation of the multiword candidates is based on the surface forms, while the validation consists of series of criteria for removing noise using language independent association measures. For generating corpus count, it provides both a corpus indexation facility. Also, this approach allows easy integration with a machine learning tool for thecreation and application of supervised multiword extraction models if annotated data is available. The authors present the use of multiword in a standard configuration, for extracting MWEs from a corpus of general purpose English.
文摘Difficulty discrimination is an important step in autonomous design and interpreting teaching materials development, which is related to scientifi c nature of the materials, teaching effectiveness, and sequential teaching progress. In this paper, we focus on the diffi culty discrimination of interpretation teaching materials on the basis of analytic hierarchy process and natural language processing. We analyze several factors which affect interpretation teaching materials, and we introduce theories of analytic hierarchy process and natural language processing which is intuitive and credible operation basis.
基金Project supported by the National Natural Science Foundation of China (Nos 60533090 and 60603096)the National High-Tech Research and Development Program (863) of China (No 2006AA 010107)
文摘Automatic web image annotation is a practical and effective way for both web image retrieval and image understanding. However, current annotation techniques make no further investigation of the statement-level syntactic correlation among the annotated words, therefore making it very difficult to render natural language interpretation for images such as "pandas eat bamboo". In this paper, we propose an approach to interpret image semantics through mining the visible and textual information hidden in images. This approach mainly consists of two parts: first the annotated words of target images are ranked according to two factors, namely the visual correlation and the pairwise co-occurrence; then the statement-level syntactic correlation among annotated words is explored and natural language interpretation for the target image is obtained. Experiments conducted on real-world web images show the effectiveness of the proposed approach.