摘要
科技云服务平台积累了大量科技数据,而数据质量问题会对大数据的应用产生致命影响,因此需要对存在质量问题的大数据进行清洗。文章提出了一种数据清洗模型,采用神经网络,依据数据相关性原则实现高可扩展性的大数据清洗。使用该模型,能够以计算机自动化数据修正的方式代替数据补录与修正工作,有效地提升工作效率。
The science and technology cloud service platform has accumulated a large number of scientific and technological data, and the data quality problem will result in a fatal impact on the application of big data. Therefore, the massive data with quality problem need to be cleaned. In this paper, a data cleaning model is proposed, which according to the data correlation principle, uses neural networks to realize the big data cleaning with high scalability. Using this model, the repeated data refills and corrections can be replaced by computer automatic data correction, the work efficiency is effectively improved.
出处
《计算机时代》
2017年第7期6-9,共4页
Computer Era
基金
浙江省科技计划项目"科技信息大数据应用及示范研究"(2014F50039)