期刊文献+
共找到1篇文章
< 1 >
每页显示 20 50 100
An Improved Unsupervised Approach to Word Segmentation
1
作者 WANG Hanshi HAN Xuhong +2 位作者 LIU Lizhen SONG Wei yuan mudan 《China Communications》 SCIE CSCD 2015年第7期82-95,共14页
ESA is an unsupervised approach to word segmentation previously proposed by Wang, which is an iterative process consisting of three phases: Evaluation, Selection and Adjustment. In this article, we propose Ex ESA, the... ESA is an unsupervised approach to word segmentation previously proposed by Wang, which is an iterative process consisting of three phases: Evaluation, Selection and Adjustment. In this article, we propose Ex ESA, the extension of ESA. In Ex ESA, the original approach is extended to a 2-pass process and the ratio of different word lengths is introduced as the third type of information combined with cohesion and separation. A maximum strategy is adopted to determine the best segmentation of a character sequence in the phrase of Selection. Besides, in Adjustment, Ex ESA re-evaluates separation information and individual information to overcome the overestimation frequencies. Additionally, a smoothing algorithm is applied to alleviate sparseness. The experiment results show that Ex ESA can further improve the performance and is time-saving by properly utilizing more information from un-annotated corpora. Moreover, the parameters of Ex ESA can be predicted by a set of empirical formulae or combined with the minimum description length principle. 展开更多
关键词 分词方法 监督 个人信息 最小描述长度 ESA 平滑算法 合理利用 经验公式
下载PDF
上一页 1 下一页 到第
使用帮助 返回顶部