摘要
随着人们在互联网上的活动越来越频繁,网络新词不断涌现。现有的中文分词系统对新词的识别效率并不高。对新词的识别效率直接影响分词的精度,也对互联网应用系统的服务质量产生影响。在分词系统分词结果的基础上,提出利用搜索引擎和百度百科等Web知识,结合统计和匹配实现新词识别的方法,进一步实现对系统原始分词结果的优化。实验数据表明,该方法能够有效识别网络新词并实现分词结果的优化。
As people's activities on the Internet become more and more frequent,the new words on the web are constantly emerging. The recognition efficiency of existing Chinese word segmentation system is relatively low on new words. The identification efficiency on new words directly impacts the precision of word segment,as well as affects the services quality of internet applications. Based on the segmentation results of current word segmentation system,we propose an approach for implementing the new words recognition by using Web knowledge such as search engine and Baidupedia and combining the statistics and matching,which further realises the optimisation of primitive segmentation results of the system. Experimental data show that the proposed method can effectively identify the new Web words and achieves the optimisation of segmentation results.
出处
《计算机应用与软件》
CSCD
2015年第12期55-58,共4页
Computer Applications and Software
关键词
中文分词
未登录词
网络新词
搜索引擎
分词优化
Chinese word segmentation
Unknown word
New Web word
Search engine
Word segmentation optimisation