摘要
企业数据中心作为辅助决策的重要工具,保证其数据的及时性、准确性和科学性是最基本的要求和最核心的原则。对于数据异常的情况,若仅依靠人为的经验在海量数据中进行判断是很困难的,也是不科学且低效的。针对企业购销存数据的准确性问题,研究了基于机器学习的数据异常检测算法。由于购销存数据是由一组相对固定的数据项组成,可以看作是一个结构化数据序列,因此选择了解决结构化序列预测问题最为有效的条件随机场模型CRFs。通过对大量历史数据进行学习,分析出数据的自身规律以及关联关系,使计算机具备自动检测异常的能力。实验结果表明了该算法的有效性。
Data centers are an important auxiliary tool for business leaders to make decisions, and timely, accurate and scientific data are basic requirements and key principles. It is difficult and ineffi- cient to find out abnormal one in huge amounts of data by human experience. In this paper, we propose an algorithm for detecting abnormal data based on machine learning. Because enterprise sales data con- sist of a series of relatively fixed data items, they can be recognized as a structured data sequence. Con- ditional Random Fields (CRFs) model is efficient for structured data sequence prediction, so it can be used as the detection model. A large number of history data are learnt and their intrinsic rules and rela- tionship are analyzed so as to enable computers to detect abnormal data automatically. Experimental result shows the effectiveness of the proposed algorithm.
出处
《计算机工程与科学》
CSCD
北大核心
2015年第9期1756-1760,共5页
Computer Engineering & Science
基金
国家自然科学基金资助项目(61202335)
关键词
数据中心
机器学习
数据异常检测
条件随机场模型
data center
machine learning
detection of abnormal data
conditional randomfieldsmodel