Category-based statistic language model is an important method to solve the problem of sparse data.But there are two bottlenecks:1) The problem of word clustering.It is hard to find a suitable clustering method with g...Category-based statistic language model is an important method to solve the problem of sparse data.But there are two bottlenecks:1) The problem of word clustering.It is hard to find a suitable clustering method with good performance and less computation.2) Class-based method always loses the prediction ability to adapt the text in different domains.In order to solve above problems,a definition of word similarity by utilizing mutual information was presented.Based on word similarity,the definition of word set similarity was given.Experiments show that word clustering algorithm based on similarity is better than conventional greedy clustering method in speed and performance,and the perplexity is reduced from 283 to 218.At the same time,an absolute weighted difference method was presented and was used to construct vari-gram language model which has good prediction ability.The perplexity of vari-gram model is reduced from 234.65 to 219.14 on Chinese corpora,and is reduced from 195.56 to 184.25 on English corpora compared with category-based model.展开更多
This article used the Cluster analysis of statistical method to separate China's 30 provinces and municipalities into three categories according to their energy consumption discrepancies and characteristics from 1985...This article used the Cluster analysis of statistical method to separate China's 30 provinces and municipalities into three categories according to their energy consumption discrepancies and characteristics from 1985 to 2007. The categories were high, moderate and low energy consumption areas and they had significant differences in energy consumption. Based on this classification, the authors analyzed the influencing factors of energy consumption in the three areas by means of panel data econometric model. The results showed that the influencing factors were obviously different. In order to support national goal of energy conservation and emission reduction, the energy measures and policies should be distinctly taken.展开更多
基金Project(60763001) supported by the National Natural Science Foundation of ChinaProject(2010GZS0072) supported by the Natural Science Foundation of Jiangxi Province,ChinaProject(GJJ12271) supported by the Science and Technology Foundation of Provincial Education Department of Jiangxi Province,China
文摘Category-based statistic language model is an important method to solve the problem of sparse data.But there are two bottlenecks:1) The problem of word clustering.It is hard to find a suitable clustering method with good performance and less computation.2) Class-based method always loses the prediction ability to adapt the text in different domains.In order to solve above problems,a definition of word similarity by utilizing mutual information was presented.Based on word similarity,the definition of word set similarity was given.Experiments show that word clustering algorithm based on similarity is better than conventional greedy clustering method in speed and performance,and the perplexity is reduced from 283 to 218.At the same time,an absolute weighted difference method was presented and was used to construct vari-gram language model which has good prediction ability.The perplexity of vari-gram model is reduced from 234.65 to 219.14 on Chinese corpora,and is reduced from 195.56 to 184.25 on English corpora compared with category-based model.
文摘This article used the Cluster analysis of statistical method to separate China's 30 provinces and municipalities into three categories according to their energy consumption discrepancies and characteristics from 1985 to 2007. The categories were high, moderate and low energy consumption areas and they had significant differences in energy consumption. Based on this classification, the authors analyzed the influencing factors of energy consumption in the three areas by means of panel data econometric model. The results showed that the influencing factors were obviously different. In order to support national goal of energy conservation and emission reduction, the energy measures and policies should be distinctly taken.