This paper proposes a new way to improve the performance of dependency parser: subdividing verbs according to their grammatical functions and integrating the information of verb subclasses into lexicalized parsing mod...This paper proposes a new way to improve the performance of dependency parser: subdividing verbs according to their grammatical functions and integrating the information of verb subclasses into lexicalized parsing model. Firstly,the scheme of verb subdivision is described. Secondly,a maximum entropy model is presented to distinguish verb subclasses. Finally,a statistical parser is developed to evaluate the verb subdivision. Experimental results indicate that the use of verb subclasses has a good influence on parsing performance.展开更多
We focused in this study on two verbs of motion ba (come) and hevi (bring) used in contemporary Hebrew pointing to a number of semantic shifts occurring in each of them and to categorical shifts that occurred in t...We focused in this study on two verbs of motion ba (come) and hevi (bring) used in contemporary Hebrew pointing to a number of semantic shifts occurring in each of them and to categorical shifts that occurred in the verb ha. We conducted a semantic and syntactic analysis of these shifts in which we observed: a change in the syntactic valuation of ba and hevi, the semantic characteristic of the nominal collocations which form their syntactic setting, and the semantic connection between their original and new meanings. The article starts out with a presentation of the original meanings of the two verbs as belonging to the family of concrete verbs of motion. It then presents the semantic shifts each undergoing from designating motion to designating giving, existing, and modality (capability, intent and aspect) and concludes with the categorical shift of the verb ba to impersonal (ロ"λ∏) and to discourse marker. It is noteworthy that in each of the shifts observed we noticed relation between the meaning stemming from the shift and the original meaning of ba and hevi as verbs of motion. We were able to prove that the original meaning is still echoed both in the semantic and category categorical shifts.展开更多
Category-based statistic language model is an important method to solve the problem of sparse data.But there are two bottlenecks:1) The problem of word clustering.It is hard to find a suitable clustering method with g...Category-based statistic language model is an important method to solve the problem of sparse data.But there are two bottlenecks:1) The problem of word clustering.It is hard to find a suitable clustering method with good performance and less computation.2) Class-based method always loses the prediction ability to adapt the text in different domains.In order to solve above problems,a definition of word similarity by utilizing mutual information was presented.Based on word similarity,the definition of word set similarity was given.Experiments show that word clustering algorithm based on similarity is better than conventional greedy clustering method in speed and performance,and the perplexity is reduced from 283 to 218.At the same time,an absolute weighted difference method was presented and was used to construct vari-gram language model which has good prediction ability.The perplexity of vari-gram model is reduced from 234.65 to 219.14 on Chinese corpora,and is reduced from 195.56 to 184.25 on English corpora compared with category-based model.展开更多
基金the National Natural Science Foundation of China (No.60435020, 60575042 and 60503072).
文摘This paper proposes a new way to improve the performance of dependency parser: subdividing verbs according to their grammatical functions and integrating the information of verb subclasses into lexicalized parsing model. Firstly,the scheme of verb subdivision is described. Secondly,a maximum entropy model is presented to distinguish verb subclasses. Finally,a statistical parser is developed to evaluate the verb subdivision. Experimental results indicate that the use of verb subclasses has a good influence on parsing performance.
文摘We focused in this study on two verbs of motion ba (come) and hevi (bring) used in contemporary Hebrew pointing to a number of semantic shifts occurring in each of them and to categorical shifts that occurred in the verb ha. We conducted a semantic and syntactic analysis of these shifts in which we observed: a change in the syntactic valuation of ba and hevi, the semantic characteristic of the nominal collocations which form their syntactic setting, and the semantic connection between their original and new meanings. The article starts out with a presentation of the original meanings of the two verbs as belonging to the family of concrete verbs of motion. It then presents the semantic shifts each undergoing from designating motion to designating giving, existing, and modality (capability, intent and aspect) and concludes with the categorical shift of the verb ba to impersonal (ロ"λ∏) and to discourse marker. It is noteworthy that in each of the shifts observed we noticed relation between the meaning stemming from the shift and the original meaning of ba and hevi as verbs of motion. We were able to prove that the original meaning is still echoed both in the semantic and category categorical shifts.
基金Project(60763001) supported by the National Natural Science Foundation of ChinaProject(2010GZS0072) supported by the Natural Science Foundation of Jiangxi Province,ChinaProject(GJJ12271) supported by the Science and Technology Foundation of Provincial Education Department of Jiangxi Province,China
文摘Category-based statistic language model is an important method to solve the problem of sparse data.But there are two bottlenecks:1) The problem of word clustering.It is hard to find a suitable clustering method with good performance and less computation.2) Class-based method always loses the prediction ability to adapt the text in different domains.In order to solve above problems,a definition of word similarity by utilizing mutual information was presented.Based on word similarity,the definition of word set similarity was given.Experiments show that word clustering algorithm based on similarity is better than conventional greedy clustering method in speed and performance,and the perplexity is reduced from 283 to 218.At the same time,an absolute weighted difference method was presented and was used to construct vari-gram language model which has good prediction ability.The perplexity of vari-gram model is reduced from 234.65 to 219.14 on Chinese corpora,and is reduced from 195.56 to 184.25 on English corpora compared with category-based model.