摘要
近年来,随着科技的快速发展,数据呈现了爆炸式增长趋势,面对如此巨量的多源数据,如何正确识别实体,为数据分析提供高质量的数据是提高企业效益、指导企业决策的重中之中。笔者通过研究基于图的半监督可能性聚类方法,对相同实体进行有效统一,解决了数据分析之初的数据质量问题,通过实验证明了算法的有效性。
In recent years,with the rapid development of science and technology,data shows an explosive growth trend.In the face of such huge amounts of multi-source data,how to correctly identify entities and provide high-quality data for data analysis is the most important step to improve business efficiency and guide business decisions.By studying the semi-supervised probabilistic clustering method based on graphs,the author effectively unifies the same entities and solves the data quality problems at the beginning of data analysis.The effectiveness of the algorithm is proved through experiments.
作者
董志强
刘永年
魏丽华
Dong Zhiqiang;Liu Yongnian;Wei Lihua(Qingdao Machinery Industry Corporation,Qingdao Shandong 266000,China)
出处
《信息与电脑》
2018年第3期45-47,共3页
Information & Computer
基金
青岛市自主创新重大专项(项目编号:14-6-1-13-zdzx)
关键词
半监督学习
可能性聚类
实体统一
semi-supervised learning
possibility clustering
entity unification