摘要
论文提出了一种基于电商领域有关物流评价方面的分词方法。首先使用适用于短文本的标记选择方法和特征模板对CRF模型进行改进,然后通过改进后的CRF模型对评价数据进行初步分词,最后利用电商领域中有关物流评价方面的词典对初步分词的结果进行逆向最大匹配,从而提高了CRF对未登录词的识别能力以及歧义词的消解能力。论文使用人工标注好的5000条某知名服装品牌的物流评价数据作为数据集进行实验对比,实验结果表明,论文的方法相对于传统的方法具有较高的准确率和召回率。
This paper puts forward a method of word segmentation based on logistics evaluation in the field of e-commerce.Firstly,the CRF model is improved by using the label selection method and feature template suitable for short text.Then a preliminary segmentation of the evaluation data is carried out by the improved CRF model.Finally,the dictionary of logistics evaluation in the field of e-commerce is used to reverse the maximum match of the results of the preliminary participle,Thus the recognition ability of CRF to unregistered words and the disambiguation ability of ambiguous words are improved.In this paper,the data of 5,000 well-known clothing brands are used as a dataset to compare the results.The experimental results show that the method in this paper has higher accuracy and recall than the traditional method.
作者
钟静晨
祁云嵩
ZHONG Jingchen;QI Yunsong(School of Computer Science,Jiangsu University of Science and Technology,Zhenjiang 212003)
出处
《计算机与数字工程》
2019年第11期2866-2870,2883,共6页
Computer & Digital Engineering
关键词
中文分词
自然语言处理
特征模板
条件随机场
Chinese word segmentation
natural language processing
feature template
conditional random field