摘要
提出了一种基于支持向量机的中文新词识别算法.该算法结合新词内部模式以及词长等提出了基于词内部模式的改进字符位置似然概率,并综合新词的邻接类别等特征对新词进行识别.经过小说语料测试,实验结果表明:该算法的微F1值为0.583 3,宏F1值为0.775 7,分别比不考虑词内部模式的基准算法提高约63%和30%.
In this paper,a Chinese new word identification approach based on a SVM classifier was propose.The method first introduced improved independent word possibility based on the inner pattern of string and POS,and then combined accessor variety and frequency statistical features to identify Chinese new words.Experimental results showed that Micro F1 and Macro F1 of the proposed method were 0.583 3 and 0.775 7 respectively.Compared with the method not considening inner pattern of word,the performance of the presented method improved about Micro F1 63 % and Macro F1 30 % respectively.
出处
《集美大学学报(自然科学版)》
CAS
2011年第6期461-466,共6页
Journal of Jimei University:Natural Science
基金
福建省自然科学基金资助项目(2010J05133)
福建省科技创新平台计划项目(2009J1007)
福州大学科技专项启动基金资助项目(2010-XQ-22)