摘要
国家电网公司多年来建设了很多业务系统,如办公自动化(OA)系统、营销系统、管理信息系统等。然而,电力企业信息化建设的深入以及业务系统中数据量的急剧增长给查找数据信息带来极大不便和新的挑战。为此,提出了一个面向电力领域的数据集成系统架构,并对其中的数据获取、抽取、整合等关键技术问题进行研究;提出了基于高频率查询词采集率的数据获取方法、自底向上方法构建数据抽取包装器的思想以及无监督学习的自动化重复记录检测模式。针对电力系统各个信息孤岛进行数据集成,对各业务系统中非结构化数据进行统一存储和管理,方便用户检索出所需的数据,为电力企业员工提供便捷服务。
The State Grid has built amounts of business systems for many years, such as OA system, marketing system and management information system. However, with the deepening of power enterprise informatization construction and the sharp increase of data in business systems, it brings new challenges and inconvenience for data seekers. An architecture model of data integration in the power field is proposed, and key problems in this model such as data acquiring, extracting and integrating are investigated. A new date acquirement method based on high frequency words collecting rate is put forward, a concept to build data extraction wrapper through bottom-up approach and an automating repetitive record detection model for unsupervised learning are also proposed. Aiming at the information isolated island in the electric power system the data is integrated, and the unified storage and management of unstructured data in business systems are conducted, which can help users to obtain required data and to provide convenient services for staffs in the power enterprise.
出处
《山东电力技术》
2016年第11期23-27,共5页
Shandong Electric Power