摘要
提出一种特殊标记符和词根沙普利值二步骤分词模型,提高分词的准确率,通过搜索引擎指数来识别新词。在相似度比较方面,提出了带行列顺序罚分因子距离矩阵模型,该模型综合了向量检测、汉明距离和最长公共子串的特点,重新定义了距离矩阵。与传统的论文相似性检索相比,具有分词准确,计算量小等优点。
A two-step segmentation model of special identifier and root Sharpley value was proposed in this paper,which can improve the segmentation accuracy and recognize new words through the search engine exponent.For comparing the similarity,a distance matrix model with row-column order penalty factor was proposed.This model integrates the characteristics of vector detection,hamming distance and the longest common substring,redefining distance matrix.Compared with the traditional paper similarity retrieval,the present method has advantages in the accuracy of word segmentation,low computation,reliability and high efficiency.
出处
《湖北工业大学学报》
2015年第1期36-38,55,共4页
Journal of Hubei University of Technology
基金
湖北省教育厅科学研究计划资助项目(D20141403)
关键词
中文分词
相似度比较
距离矩阵
Chinese segmentation
similarity comparison
Ddistance matrix