海量数据分析及处理算法实现被引量：4

Accomplishment of Algorithm of Massive Data Analysis and Processing

下载PDF

导出

摘要随着科学技术的发展,计算机的计算能力每年也都在飞快增长,需要处理的数据量更是呈指数级的增长。这样,对海量数据的分析处理是当今的重要话题之一。在实际应用中,因为笔者需要处理csv文本文件中的海量数据,数据量至少在25M以上,并要求处理时间能达到客户的需求,所以设计了一种快速处理海量数据的算法。该算法中包括对海量数据的提取、分析、处理等过程。通过对算法验证,处理数据时间达到了预期的要求。 With the development of science and technology, the computing capability of computers grows rapidly every year, so does data that needs processing. Therefore, the analysis and processing of massive data is one of the important topics. In practice, the au- thor designs a fast algorithm for processing massive data to deal with the massive data with the amount of data over at least 25M in csv text file and to achieve customer needs in the required processing time. The algorithm includes the extraction, analysis and processing of massive data, and it achieves the desired requirements via verification of algorithm.

作者朱德新宋雅娟

机构地区长春大学计算机科学技术学院

出处《长春大学学报》 2011年第8期42-45,共4页 Journal of Changchun University

关键词 CSV 海量数据处理时间 csv massive data processing time

分类号 TP301.6 [自动化与计算机技术—计算机系统结构]

引文网络
相关文献

参考文献4

1陈康,郑纬民.云计算:系统实例与研究现状[J].软件学报,2009,20(5):1337-1348. 被引量：1312
2J. Dean and S. Ghemawat. MapReduce: Simplified data processing on largeclusters[ M]. In Proc. OSDI, 2004.
3David J. DeWitt, Jim Gray. Parallel Database Systems[ M]. The Future of High Performance Database Processing, 1992.
4Ben Lorica. HadoopDB[ M]. An Open Source Parallel Database. 2009.

二级参考文献29

1Sims K. IBM introduces ready-to-use cloud computing collaboration services get clients started with cloud computing. 2007. http://www-03.ibm.com/press/us/en/pressrelease/22613.wss
2Boss G, Malladi P, Quan D, Legregni L, Hall H. Cloud computing. IBM White Paper, 2007. http://download.boulder.ibm.com/ ibmdl/pub/software/dw/wes/hipods/Cloud_computing_wp_final_8Oct.pdf
3Zhang YX, Zhou YZ. 4VP+: A novel meta OS approach for streaming programs in ubiquitous computing. In: Proc. of IEEE the 21st Int'l Conf. on Advanced Information Networking and Applications (AINA 2007). Los Alamitos: IEEE Computer Society, 2007. 394-403.
4Zhang YX, Zhou YZ. Transparent Computing: A new paradigm for pervasive computing. In: Ma JH, Jin H, Yang LT, Tsai JJP, eds. Proc. of the 3rd Int'l Conf. on Ubiquitous Intelligence and Computing (UIC 2006). Berlin, Heidelberg: Springer-Verlag, 2006. 1-11.
5Barroso LA, Dean J, Holzle U. Web search for a planet: The Google cluster architecture. IEEE Micro, 2003,23(2):22-28.
6Brin S, Page L. The anatomy of a large-scale hypertextual Web search engine. Computer Networks, 1998,30(1-7): 107-117.
7Ghemawat S, Gobioff H, Leung ST. The Google file system. In: Proc. of the 19th ACM Symp. on Operating Systems Principles. New York: ACM Press, 2003.29-43.
8Dean J, Ghemawat S. MapReduce: Simplified data processing on large clusters. In: Proc. of the 6th Symp. on Operating System Design and Implementation. Berkeley: USENIX Association, 2004. 137-150.
9Burrows M. The chubby lock service for loosely-coupled distributed systems. In: Proc. of the 7th USENIX Symp. on Operating Systems Design and Implementation. Berkeley: USENIX Association, 2006. 335-350.
10Chang F, Dean J, Ghemawat S, Hsieh WC, Wallach DA, Burrows M, Chandra T, Fikes A, Gruber RE. Bigtable: A distributed storage system for structured data. In: Proc. of the 7th USENIX Symp. on Operating Systems Design and Implementation. Berkeley: USENIX Association, 2006. 205-218.

共引文献1311

1查伟,孙燕琼,郑继平.基于云测试架构的FIVP解决方案[J].铁路技术创新,2021(S01):82-86.
2林少伟.人工智能法律主体资格实现路径:以商事主体为视角[J].中国政法大学学报,2021(3):165-177. 被引量：7
3胡祖林,肇杰.云计算下的网盘安全[J].计算机产品与流通,2020,0(1):164-164.
4张盛,任伟,王玉,黄金明,陈旭彤.基于Web的重力异常正演建模工具[J].地质论评,2023,69(S01):595-597.
5赵文韬.基于5G技术的黑龙江云计算产业发展[J].电子技术（上海）,2020,49(9):186-187.
6Longfei He,Mei Xue,Bin Gu.Internet-of-things enabled supply chain planning and coordination with big data services:Certain theoretic implications[J].Journal of Management Science and Engineering,2020,5(1):1-22. 被引量：6
7吴劲松,陈孚.云计算发展及应用研究[J].广西通信技术,2011(2):9-13. 被引量：5
8黄纬,温志萍,程初.云计算中基于K-均值聚类的虚拟机调度算法研究[J].南京理工大学学报,2013,37(6):807-812. 被引量：17
9孙凌宇,欧阳春娟,冷明,刘昌鑫,夏洁武.云计算与高等教育管理信息服务系统构建[J].山西财经大学学报,2012,34(S1). 被引量：9
10王荣荣.云计算技术基础上数字图书馆云服务平台的实现[J].河北北方学院学报（社会科学版）,2013,29(4):72-74. 被引量：2

同被引文献32

1王文杰,潘英姿,李雪.区域生态质量评价指标选择基础框架及其实现[J].中国环境监测,2001,17(5):17-20. 被引量：20
2杨武军,张继荣,屈军锁.内存数据库技术综述[J].西安邮电学院学报,2005,10(3):95-99. 被引量：39
3沈斌,蒋昌俊,章昭辉,郑春雷,刘海涛.一种基于海量GPS数据的分布式地图匹配系统的设计与实现[J].小型微型计算机系统,2007,28(3):479-481. 被引量：4
4李春华,李宁,史培军.基于SOM模型的中国耕地压力分类研究[J].长江流域资源与环境,2007,16(3):318-322. 被引量：10
5袁培森,皮德常.用于内存数据库的Hash索引的设计与实现[J].计算机工程,2007,33(18):69-71. 被引量：21
6十二五智能交通规划,打造物联网十倍商机[EB/OL].(2011-10-19)[2012-05-10].http://www.chinadaily.com.cn/mic-to-reading/tech/2011-10-19/content_4107623.htm.
7Wikipedia. Cloud computing [EB/OL]. [2012-05-05]. http:// en. wikipedia, org/wiki/Cloud_computing.
8DEAN J, GHEMAWAT S. MapReduce: simplified data processing on large clusters [ J]. Communications of the ACM, 2008, 51 (1) : 107 - 113.
9MARZ N, WARREN J. Big data: principles and best practices of scalable realtime data systems [ M]. Greenwich: Manning Publica- tions, 2012:1-28.
10NoSQL. Wikipedia [ EB/OL]. [2012-04-07]. http://en. wikipedia, org:,/wikiJNoSQL.

引证文献4

1徐翔,邹复民,廖律超,朱铨.基于GemFire的海量数据计算性能实验分析[J].计算机应用,2013,33(1):226-229. 被引量：5
2陈春媛.基于qcow2镜像格式的快照技术[J].信息与电脑,2017,29(1):34-35.
3韦春波,李树春,魏巍,李洋洋,贾永全.基于SOM神经网络的大庆市畜牧生产聚类分析[J].黑龙江八一农垦大学学报,2018,30(2):95-99. 被引量：2
4王洪明,魏小雨,于江.钢铁企业环保“管控治一体化平台”的搭建与运行[J].河北冶金,2021(2):74-79. 被引量：4

二级引证文献11

1刘劲松,刘蓓.钢铁企业环保门禁系统设计与实现[J].冶金自动化,2023,47(S01):336-338.
2方金,陈亮,刘海鹏,王铎,杨舒凡.数字一体化环保管控平台的探索和应用[J].河北冶金,2023(S01):58-61.
3赖宏图,朱铨,蒋新华,邹复民.交通物联网的系统架构与技术体系[J].福建工程学院学报,2015,13(1):31-36. 被引量：1
4朱铨.云平台动态弹性扩展关键技术研究[J].福建工程学院学报,2016,14(1):71-75.
5梅巧玲,杨立鹏,樊春美,冯炎.基于Redis技术的常用联系人集群架构优化[J].铁路计算机应用,2018,27(10):35-39. 被引量：2
6龚争,程郁昕.荷斯坦奶牛外貌性状对305d产奶量的通径分析[J].安徽科技学院学报,2019,33(1):1-4. 被引量：1
7张莹,谷圣臣,潘俊池,贾永全.基于ARIMA型的中国家禽出栏量预测[J].黑龙江八一农垦大学学报,2020,32(3):20-26. 被引量：4
8朱铮雄,黄宇青.基于Spring Batch+Gemfire+CXF的金融大数据集成和整合[J].计算机应用与软件,2020,37(8):27-32. 被引量：5
9王尊.大数据关键技术及其经济社会应用研究[J].新闻研究导刊,2020,11(17):247-248. 被引量：2
10王雪薇,马志超,梁涛.循环经济模式下钢铁企业绿色价值链重构[J].河北冶金,2022(3):68-70. 被引量：6

1洪蕾.利用VB6实现Access数据库到Excel的数据导入[J].芜湖师专学报,2002(2):95-97.
2Alan Sugano Jean Hsi,臧铁军(译者).提高Access数据库的效率[J].Windows IT Pro Magazine（国际中文版）,2006(7):29-30.
3AlanSuganoJeanHsi 臧铁军(译).提高Access数据库的效率[J].Windows IT Pro Magazine（国际中文版）,2007(C00):380-381.
4硬盘不能启动的常见故障分析及处理[J].新概念电脑,2002(8):70-70.
5陈玉龙.硬盘不能正常引导的常见故障分析及处理[J].中国金融电脑,2000(2):78-79.
6刘改兰.局域网中常见故障分析及处理[J].甘肃科技,2004,20(4):55-55.
7苑小刚.生产过程自动化仪表常见故障分析及处理[J].科技与企业,2012(6):305-305. 被引量：6
8李恒响.基于Photoshop的图片处理技巧在网店中的应用[J].电脑知识与技术,2013,9(12X):8393-8395. 被引量：7
9宗雪,史双喜.大数据管理技术研究[J].电子技术与软件工程,2016(12):167-167.
10刘琦琳.数据的随需和IBM的应变[J].互联网周刊,2008(11):99-100.

长春大学学报

2011年第8期

浏览历史

内容加载中请稍等...

海量数据分析及处理算法实现被引量：4

参考文献4

二级参考文献29

共引文献1311

同被引文献32

引证文献4

二级引证文献11

相关作者

相关机构

相关主题

浏览历史

海量数据分析及处理算法实现 被引量：4

参考文献4

二级参考文献29

共引文献1311

同被引文献32

引证文献4

二级引证文献11

相关作者

相关机构

相关主题

浏览历史

海量数据分析及处理算法实现被引量：4