期刊文献+
共找到2篇文章
< 1 >
每页显示 20 50 100
Vari-gram language model based on word clustering
1
作者 袁里驰 《Journal of Central South University》 SCIE EI CAS 2012年第4期1057-1062,共6页
Category-based statistic language model is an important method to solve the problem of sparse data.But there are two bottlenecks:1) The problem of word clustering.It is hard to find a suitable clustering method with g... Category-based statistic language model is an important method to solve the problem of sparse data.But there are two bottlenecks:1) The problem of word clustering.It is hard to find a suitable clustering method with good performance and less computation.2) Class-based method always loses the prediction ability to adapt the text in different domains.In order to solve above problems,a definition of word similarity by utilizing mutual information was presented.Based on word similarity,the definition of word set similarity was given.Experiments show that word clustering algorithm based on similarity is better than conventional greedy clustering method in speed and performance,and the perplexity is reduced from 283 to 218.At the same time,an absolute weighted difference method was presented and was used to construct vari-gram language model which has good prediction ability.The perplexity of vari-gram model is reduced from 234.65 to 219.14 on Chinese corpora,and is reduced from 195.56 to 184.25 on English corpora compared with category-based model. 展开更多
关键词 word similarity word clustering statistical language model vari-gram language model
下载PDF
Predicting Chinese Abbreviations from Definitions:An Empirical Learning Approach Using Support Vector Regression 被引量:8
2
作者 孙栩 王厚峰 王波 《Journal of Computer Science & Technology》 SCIE EI CSCD 2008年第4期602-611,共10页
In Chinese, phrases and named entities play a central role in information retrieval. Abbreviations, however make keyword-based approaches less effective. This paper presents an empirical learning approach to Chinese a... In Chinese, phrases and named entities play a central role in information retrieval. Abbreviations, however make keyword-based approaches less effective. This paper presents an empirical learning approach to Chinese abbreviation prediction. In this study, each abbreviation is taken as a reduced form of the corresponding definition (expanded form), and the abbreviation prediction is formalized as a scoring and ranking problem among abbreviation candidates, which are automatically generated from the corresponding definition. By employing Support Vector Regression (SVR) for scoring, we can obtain multiple abbreviation candidates together with their SVR values, which are used for candidate ranking. Experimental results show that the SVR method performs better than the popular heuristic rule of abbreviation prediction. In addition, in abbreviation prediction, the SVR method outperforms the hidden Markov model (HMM). 展开更多
关键词 statistical natural language processing abbreviation prediction support vector regression word clustering
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部