The study of modal auxiliary verbs has been done by comparing modal auxiliary verbs in English with the ones in Chinese qualitatively and quantitatively. The modals in English and in Chinese are statistically analyzed...The study of modal auxiliary verbs has been done by comparing modal auxiliary verbs in English with the ones in Chinese qualitatively and quantitatively. The modals in English and in Chinese are statistically analyzed through their forms and meanings. The data consists of 50 pieces of Chinese prose with their 50 English translation versions called corpus A and 50 pieces of English prose with their Chinese translation versions called corpus B, altogether 200 articles, which represent a type of discourse that is rich in modal auxiliary verbs both in English and in Chinese The major findings are as follows: (1) The three criteria: inversion, negation, and the use of pro-forms can be used to define both English and Chinese auxiliaries; (2) the modals of both languages can be analyzed within the same semantic categories: volition, probability, and necessity; (3) Chinese epistemic modals can have inversion patterns; (4) the negative forms of Chinese modals are more complex than those of English modals; and (5) the statistic analysis shows that the modals in probability category both in English and in Chinese are used much more often compared to the other two categories: volition and necessity and that deontic modals are used much fewer in both languages to express necessity展开更多
Linguistic dynamic systems(LDS)are dynamic processes involving computing with words(CW)for modeling and analysis of complex systems.In this paper,a fuzzy neural network(FNN)structure of LDS was proposed.In addition,an...Linguistic dynamic systems(LDS)are dynamic processes involving computing with words(CW)for modeling and analysis of complex systems.In this paper,a fuzzy neural network(FNN)structure of LDS was proposed.In addition,an improved nonlinear particle swarm optimization was employed for training FNN.The experiment results on logistics formulation demonstrates the feasibility and the efficiency of this FNN model.展开更多
In order to overcome defects of the classical hidden Markov model (HMM), Markov family model (MFM), a new statistical model was proposed. Markov family model was applied to speech recognition and natural language proc...In order to overcome defects of the classical hidden Markov model (HMM), Markov family model (MFM), a new statistical model was proposed. Markov family model was applied to speech recognition and natural language processing. The speaker independently continuous speech recognition experiments and the part-of-speech tagging experiments show that Markov family model has higher performance than hidden Markov model. The precision is enhanced from 94.642% to 96.214% in the part-of-speech tagging experiments, and the work rate is reduced by 11.9% in the speech recognition experiments with respect to HMM baseline system.展开更多
Lexicalized reordering models are very important components of phrasebased translation systems.By examining the reordering relationships between adjacent phrases,conventional methods learn these models from the word a...Lexicalized reordering models are very important components of phrasebased translation systems.By examining the reordering relationships between adjacent phrases,conventional methods learn these models from the word aligned bilingual corpus,while ignoring the effect of the number of adjacent bilingual phrases.In this paper,we propose a method to take the number of adjacent phrases into account for better estimation of reordering models.Instead of just checking whether there is one phrase adjacent to a given phrase,our method firstly uses a compact structure named reordering graph to represent all phrase segmentations of a parallel sentence,then the effect of the adjacent phrase number can be quantified in a forward-backward fashion,and finally incorporated into the estimation of reordering models.Experimental results on the NIST Chinese-English and WMT French-Spanish data sets show that our approach significantly outperforms the baseline method.展开更多
In this work,an approach is proposed to acquire synonymous attribute phrases of named entities(NEs) from an online encyclopedia.Synonymous attribute phrases are the phrases that express the same attribute with differe...In this work,an approach is proposed to acquire synonymous attribute phrases of named entities(NEs) from an online encyclopedia.Synonymous attribute phrases are the phrases that express the same attribute with different surface forms for a class of NEs.Specifically,the proposed approach is composed of three stages.Firstly,the entries related to a given NE class are automatically selected from an online encyclopedia.Secondly,attribute phrases are extracted based on the statistics of phrase frequency.Thirdly,synonymous attributes are identified in a pairwise manner through a classification framework combining multiple features.The proposed approach is applied on Baidu Baike,a Chinese online encyclopedia,for four different NE classes.Experimental results show that the approach obtains an average precision of 74%and an average F-value of 65%for the four NE classes.In particular,thousands of synonymous attribute phrase pairs are acquired for each class,which demonstrates the effectiveness of the proposed approach.展开更多
Category-based statistic language model is an important method to solve the problem of sparse data.But there are two bottlenecks:1) The problem of word clustering.It is hard to find a suitable clustering method with g...Category-based statistic language model is an important method to solve the problem of sparse data.But there are two bottlenecks:1) The problem of word clustering.It is hard to find a suitable clustering method with good performance and less computation.2) Class-based method always loses the prediction ability to adapt the text in different domains.In order to solve above problems,a definition of word similarity by utilizing mutual information was presented.Based on word similarity,the definition of word set similarity was given.Experiments show that word clustering algorithm based on similarity is better than conventional greedy clustering method in speed and performance,and the perplexity is reduced from 283 to 218.At the same time,an absolute weighted difference method was presented and was used to construct vari-gram language model which has good prediction ability.The perplexity of vari-gram model is reduced from 234.65 to 219.14 on Chinese corpora,and is reduced from 195.56 to 184.25 on English corpora compared with category-based model.展开更多
This research proposes and implements an Arabic Sub-Words Recognition System (ASWR). The system focuses on employing a combination of statistical and structural features to provide complete pattern's description an...This research proposes and implements an Arabic Sub-Words Recognition System (ASWR). The system focuses on employing a combination of statistical and structural features to provide complete pattern's description and enhances the recognition rate. Support Vector Machines (SVMs) is utilized as a promising pattern recognition tool. In addition to that, the problems of dots and holes are solved in a completely different way from the ones previously employed. The proposed system proceeds in several phases as follows: (1) image acquisition, (2) binarisation, (3) morphological processing, (4) feature extraction, which includes statistical features, i.e., moment invariants, and structural features, i.e., dot number, dot position, and number of holes, features, and (5) classification, using multi-class SVMs and applying a one-against-all technique. The proposed system has been tested using different sets of words and subwords and has achieved a nearly 98.90% recogiaition rate. Comparative results with NNs are also presented.展开更多
文摘The study of modal auxiliary verbs has been done by comparing modal auxiliary verbs in English with the ones in Chinese qualitatively and quantitatively. The modals in English and in Chinese are statistically analyzed through their forms and meanings. The data consists of 50 pieces of Chinese prose with their 50 English translation versions called corpus A and 50 pieces of English prose with their Chinese translation versions called corpus B, altogether 200 articles, which represent a type of discourse that is rich in modal auxiliary verbs both in English and in Chinese The major findings are as follows: (1) The three criteria: inversion, negation, and the use of pro-forms can be used to define both English and Chinese auxiliaries; (2) the modals of both languages can be analyzed within the same semantic categories: volition, probability, and necessity; (3) Chinese epistemic modals can have inversion patterns; (4) the negative forms of Chinese modals are more complex than those of English modals; and (5) the statistic analysis shows that the modals in probability category both in English and in Chinese are used much more often compared to the other two categories: volition and necessity and that deontic modals are used much fewer in both languages to express necessity
基金National Natural Science Foundation of China(No.60873179)Doctoral Program Foundation of Institutions of Higher Education of China(No.20090121110032)+3 种基金Shenzhen Science and Technology Research Foundations,China(No.JC200903180630A,No.ZYB200907110169A)Key Project of Institutes Serving for the Economic Zone on the Western Coast of the Tai wan Strait,ChinaNatural Science Foundation of Xiamen,China(No.3502Z2093018)Projects of Education Depart ment of Fujian Province of China(No.JK2009017,No.JK2010031,No.JA10196)
文摘Linguistic dynamic systems(LDS)are dynamic processes involving computing with words(CW)for modeling and analysis of complex systems.In this paper,a fuzzy neural network(FNN)structure of LDS was proposed.In addition,an improved nonlinear particle swarm optimization was employed for training FNN.The experiment results on logistics formulation demonstrates the feasibility and the efficiency of this FNN model.
基金Project(60763001)supported by the National Natural Science Foundation of ChinaProjects(2009GZS0027,2010GZS0072)supported by the Natural Science Foundation of Jiangxi Province,China
文摘In order to overcome defects of the classical hidden Markov model (HMM), Markov family model (MFM), a new statistical model was proposed. Markov family model was applied to speech recognition and natural language processing. The speaker independently continuous speech recognition experiments and the part-of-speech tagging experiments show that Markov family model has higher performance than hidden Markov model. The precision is enhanced from 94.642% to 96.214% in the part-of-speech tagging experiments, and the work rate is reduced by 11.9% in the speech recognition experiments with respect to HMM baseline system.
基金supported by the National Natural Science Foundation of China(No.61303082) the Research Fund for the Doctoral Program of Higher Education of China(No.20120121120046)
文摘Lexicalized reordering models are very important components of phrasebased translation systems.By examining the reordering relationships between adjacent phrases,conventional methods learn these models from the word aligned bilingual corpus,while ignoring the effect of the number of adjacent bilingual phrases.In this paper,we propose a method to take the number of adjacent phrases into account for better estimation of reordering models.Instead of just checking whether there is one phrase adjacent to a given phrase,our method firstly uses a compact structure named reordering graph to represent all phrase segmentations of a parallel sentence,then the effect of the adjacent phrase number can be quantified in a forward-backward fashion,and finally incorporated into the estimation of reordering models.Experimental results on the NIST Chinese-English and WMT French-Spanish data sets show that our approach significantly outperforms the baseline method.
基金Supported by the National High Technology Research and Development Programme of China(No.2008AA01Z144)the National NaturalScience Foundation of China(No.61073126,61073129)
文摘In this work,an approach is proposed to acquire synonymous attribute phrases of named entities(NEs) from an online encyclopedia.Synonymous attribute phrases are the phrases that express the same attribute with different surface forms for a class of NEs.Specifically,the proposed approach is composed of three stages.Firstly,the entries related to a given NE class are automatically selected from an online encyclopedia.Secondly,attribute phrases are extracted based on the statistics of phrase frequency.Thirdly,synonymous attributes are identified in a pairwise manner through a classification framework combining multiple features.The proposed approach is applied on Baidu Baike,a Chinese online encyclopedia,for four different NE classes.Experimental results show that the approach obtains an average precision of 74%and an average F-value of 65%for the four NE classes.In particular,thousands of synonymous attribute phrase pairs are acquired for each class,which demonstrates the effectiveness of the proposed approach.
基金Project(60763001) supported by the National Natural Science Foundation of ChinaProject(2010GZS0072) supported by the Natural Science Foundation of Jiangxi Province,ChinaProject(GJJ12271) supported by the Science and Technology Foundation of Provincial Education Department of Jiangxi Province,China
文摘Category-based statistic language model is an important method to solve the problem of sparse data.But there are two bottlenecks:1) The problem of word clustering.It is hard to find a suitable clustering method with good performance and less computation.2) Class-based method always loses the prediction ability to adapt the text in different domains.In order to solve above problems,a definition of word similarity by utilizing mutual information was presented.Based on word similarity,the definition of word set similarity was given.Experiments show that word clustering algorithm based on similarity is better than conventional greedy clustering method in speed and performance,and the perplexity is reduced from 283 to 218.At the same time,an absolute weighted difference method was presented and was used to construct vari-gram language model which has good prediction ability.The perplexity of vari-gram model is reduced from 234.65 to 219.14 on Chinese corpora,and is reduced from 195.56 to 184.25 on English corpora compared with category-based model.
文摘This research proposes and implements an Arabic Sub-Words Recognition System (ASWR). The system focuses on employing a combination of statistical and structural features to provide complete pattern's description and enhances the recognition rate. Support Vector Machines (SVMs) is utilized as a promising pattern recognition tool. In addition to that, the problems of dots and holes are solved in a completely different way from the ones previously employed. The proposed system proceeds in several phases as follows: (1) image acquisition, (2) binarisation, (3) morphological processing, (4) feature extraction, which includes statistical features, i.e., moment invariants, and structural features, i.e., dot number, dot position, and number of holes, features, and (5) classification, using multi-class SVMs and applying a one-against-all technique. The proposed system has been tested using different sets of words and subwords and has achieved a nearly 98.90% recogiaition rate. Comparative results with NNs are also presented.