摘要
建立了应用于某流程企业的数据仓库。首先分析了数据的不同来源,设计了基于此数据源的数据清洗工具,提出数据清洗时遇到的技术问题和解决方案,并着重分析了罩盖技术检查重复数据的过程。采用数据仓库-实体联系概念模型设计了生产费用、油气生产、干气和副产品库存为主题的数据仓库,并用商业智能插件实现了联机分析处理对数据仓库的查询。采用了基于主成分分析和拉格朗日公式的支持向量机方法建立了此公司各产品产量的分析预测模型,实现了用干气预测原料气和副产品的功能,获得了良好的经济效益。
A Data Warehouse (DW) was built in a process industry. Firstly, different sources of data were analyzed and the Extraction Transformation Loading (ETL) tool for DW was designed. Then, the technical problems and solutions encountered in the process of data cleaning were proposed. The Canopy methods to clean reduplicate data were studied in detail. DW with the subject of production cost, output of oil and gas, storage of gas and byproducts were designed by Data Warehouse-Entity Relationship (DWER) model and inquired with On-Line Analytical Processing (OLAP) by Business Intelligence (BI) beans. Finally, the model to predict outputs between products was built by Principal components analysis Lagrange Support Vector Machine (PLSVM) method. As a result, the output of raw gas and byproducts could be predicted by gas.
出处
《计算机集成制造系统》
EI
CSCD
北大核心
2006年第6期899-904,共6页
Computer Integrated Manufacturing Systems
基金
国家863/CIMS主题资助项目(2002AA412410)~~
关键词
数据仓库
数据清洗工具
罩盖技术
数据仓库-实体联系概念模型
支持向量机方法
data warehouse
extraction transformation loading tool
canopy methods
data warehouse- entity relationship model
support vector machine method