Data sparseness has been an inherited issue of statistical language models and smoothing method is usually used to resolve the zero count problems. In this paper, we studied empirically and analyzed the well-known smo...Data sparseness has been an inherited issue of statistical language models and smoothing method is usually used to resolve the zero count problems. In this paper, we studied empirically and analyzed the well-known smoothing methods of Good-Turing and advanced Good-Turing for language models on large sizes Chinese corpus. In the paper, ten models are generated sequentially on various size of corpus, from 30 M to 300 M Chinese words of CGW corpus. In our experiments, the smoothing methods;Good-Turing and Advanced Good-Turing smoothing are evaluated on inside testing and outside testing. Based on experiments results, we analyzed further the trends of perplexity of smoothing methods, which are useful for employing the effective smoothing methods to alleviate the issue of data sparseness on various sizes of language models. Finally, some helpful observations are described in detail.展开更多
This article aims to study the geomorpometric features of alluvial fans since they act as a small-scale geomorphic unit response to tectonics and climate changes around the Chaka-Qinghai Lake area in the northeastern ...This article aims to study the geomorpometric features of alluvial fans since they act as a small-scale geomorphic unit response to tectonics and climate changes around the Chaka-Qinghai Lake area in the northeastern Tibetan Plateau. We quantitatively extracted geomorphic parameters, such as the surface area and slope of alluvial fans adjacent to the Qinghai Nan Shan and Ela Shan. Alluvial fans in the Chaka Lake partition area, south of the Qinghai Nan Shan, are featured by a small area and short length, but the largest slope. Geomorphic parameters of the alluvial fans in Ela Shan area are in- termediate in size, and the alluvial fans in the Qinghai Lake partition area north of Qinghai Nan Shan have the gentlest slope. Together with the regional faulting activity analysis, we suggest that the alluvial fans with the high slopes in the south of Qinghai Nan Shan are mainly controlled by the reverse faulting along the Qinghai Nan Shan faults, and the strike-slip movement of the Eia Shan fault zone plays a weak role. In contrast, due to the lack of active faults, the alluvial fans near the Qinghai Lake area north of the Qinghai Nan Shan only respond to regional erosion, transportation, and deposition proc- esses, thereby forming relatively gentle geomorphic units.展开更多
文摘Data sparseness has been an inherited issue of statistical language models and smoothing method is usually used to resolve the zero count problems. In this paper, we studied empirically and analyzed the well-known smoothing methods of Good-Turing and advanced Good-Turing for language models on large sizes Chinese corpus. In the paper, ten models are generated sequentially on various size of corpus, from 30 M to 300 M Chinese words of CGW corpus. In our experiments, the smoothing methods;Good-Turing and Advanced Good-Turing smoothing are evaluated on inside testing and outside testing. Based on experiments results, we analyzed further the trends of perplexity of smoothing methods, which are useful for employing the effective smoothing methods to alleviate the issue of data sparseness on various sizes of language models. Finally, some helpful observations are described in detail.
基金supported by the Fund of the Insti-tute of Geology,CEA(No.IGCEA1115)the National Natural Science Foundation of China(Nos.41203012,41272196)
文摘This article aims to study the geomorpometric features of alluvial fans since they act as a small-scale geomorphic unit response to tectonics and climate changes around the Chaka-Qinghai Lake area in the northeastern Tibetan Plateau. We quantitatively extracted geomorphic parameters, such as the surface area and slope of alluvial fans adjacent to the Qinghai Nan Shan and Ela Shan. Alluvial fans in the Chaka Lake partition area, south of the Qinghai Nan Shan, are featured by a small area and short length, but the largest slope. Geomorphic parameters of the alluvial fans in Ela Shan area are in- termediate in size, and the alluvial fans in the Qinghai Lake partition area north of Qinghai Nan Shan have the gentlest slope. Together with the regional faulting activity analysis, we suggest that the alluvial fans with the high slopes in the south of Qinghai Nan Shan are mainly controlled by the reverse faulting along the Qinghai Nan Shan faults, and the strike-slip movement of the Eia Shan fault zone plays a weak role. In contrast, due to the lack of active faults, the alluvial fans near the Qinghai Lake area north of the Qinghai Nan Shan only respond to regional erosion, transportation, and deposition proc- esses, thereby forming relatively gentle geomorphic units.