摘要
字符匹配是数据清洗中提高数据质量的重要途径。针对中文名词简称的匹配问题,在分析现有中文匹配算法的基础上,提出一种基于数据库中各属性关联度的匹配算法,通过对匹配名词所处属性的关联属性的数据对比相似度,按照中文简称和全称数据量设置置信度,从而得到简称匹配度。实例实验表明,该算法具有较好的适用性和准确性。
Character data matching is an important way to improve the quality in data cleansing. For the problem of Chinese short term matching,based on the analysis of existing Chinese matching algorithm,a new matching algorithm is proposed based on database each attributes associated degree. Abbreviated matching degree is counted by the associated attribute comparison similarity in which Chinese abbreviated term is confident in accordance with the amount of data of Chinese abbreviated and full name.
作者
郭晖
董源
周钢
GUO Hui;DONG Yuan;ZHOU Gang(Department of Computer Technology,Electronic Engineering School,Naval University of Engineering,Wuhan 430033;Naval Hydrographic and Meteorological Center,Beijing 100000)
出处
《计算机与数字工程》
2018年第9期1726-1730,共5页
Computer & Digital Engineering
关键词
数据清洗
数据挖掘
关联度
中文简称匹配
data cleansing
data mining
correlation
Chinese abbreviated name matching