摘要
针对用电信息采集系统的非结构化数据具有海量、接入点多而分散等特点,本文提出一种用电信息采集系统非结构化数据管理设计方案。首先,对用电信息采集系统的非结构化数据进行分类。其次,提出了数据采集、数据存储和数据挖掘等3部分的管理设计方案:数据采集主要实现非结构化数据的收集;数据存储包括数据预处理和Hadoop两部分,完成海量数据的快速存储;数据挖掘按照文本、视频、音频3种类别分类处理,实现海量数据挖掘应用。该方案对用电信息采集系统的海量非结构化数据管理,有一定的参考价值。
According to the characteristics of massive quantity and numerous scattered points for unstructured data in the electrical information acquisition system,an unstructured data management framework is designed in this paper:first,the unstructured data are classified;second,a design scheme is put forward including data acquisition,data storage and data mining,where the first part realizes the collection of unstructured data,the second completes the fast storage of massive data by data preprocessing and Hadoop,and the third processes the massive data according to the categories of text,video and audio,respectively. This solution is useful for the management of massive unstructured data in the electrical information acquisition system.
出处
《电力系统及其自动化学报》
CSCD
北大核心
2016年第10期123-128,共6页
Proceedings of the CSU-EPSA
基金
国家电网公司基础性前瞻性科技资助项目(JL-71-14-001)
关键词
用电信息采集系统
非结构化数据
框架设计
海量数据
数据挖掘
electrical information acquisition system
unstructured data
framework design
massive data
data mining