摘要
针对现有的中文分词算法在特殊领域的分词性能并不理想的问题,在基于CRF分词器的基础上,结合传统的基于字典的分词方法,以及支持向量机(Support Vector Machine,SVM)分词工具,实现了一种基于投票混合模型的地址分词方法,并使用非标准地址数据对该模型进行训练与测试。实验结果表明,在对中文地址数据的分词中,该分词器比几种常用的分词工具具有更好的分词性能,为基于分词的地址数据清洗做了一个重要的基础。
Due to the fact that the performance of the existing Chinese word segmentation algorithm in specific areas is not good as expected,Chinese address word segmentation based on vote hybrid model on the basis of CRF-based word segmentation,traditional dictionary-based segmentation method and support vector machine (Support Vector Machine,SVM) segmentation tools is implemented in this paper,which is trained and tested on a non-standard address data.
出处
《工业控制计算机》
2015年第11期105-106,108,共3页
Industrial Control Computer