期刊文献+

Yarn架构下基于GA的Web日志挖掘技术 被引量:2

Web log mining technology using GA based on Yarn
下载PDF
导出
摘要 提出了一种面向TB级别日志文件挖掘需求的日志挖掘技术。采用MapReduce编程模型设计并实现了一种基于子种群联姻策略的GA,并部署到Yarn架构上,使Yarn架构与GA有效结合。在算法Partition阶段,采用Round-Robin策略代替原有的哈希方法,使各个子种群基因更加趋向均匀分布,增加了子种群收敛效率和结果准确性,同时均衡各个节点运行负载,提高了系统性能。经测试,应用该技术的挖掘结果平均准确度达到93%以上,效率提升接近33%。 This paper proposed a Web log mining technology orienting the demand of mining for TB-level log files.It designed and implemented a genus of GA which deployed on the Yarn using the MapReduce programing model to combine the GA with the Yarn effectively.In addition,took the Round-Robin strategy instead of the hash method during the Partition stage of Ma-pReduce in order to distribute the gene among the sub populations evenly,increased the populations’rate of convergence and validity of result,and balanced the workload amid all of the nodes at the same time.Finally,the test shows that the rate of the validate result using the technology,which improves the efficiency by 33%,the average accuracy is more than 93%.
出处 《计算机应用研究》 CSCD 北大核心 2014年第11期3388-3391,共4页 Application Research of Computers
基金 国家自然科学基金资助项目(61003036) 黑龙江省自然科学基金资助项目(F201124) 黑龙江省教育厅科学技术研究基金资助项目(12513048)
关键词 Yarn架构 日志挖掘 遗传算法 并行计算 Yarn log mining genetic algorithm(GA) parallel computing
  • 相关文献

参考文献9

  • 1MIRZOEVT,BENSONB,HILLHOUSED,etal.Employmentratesfordatacentermanagers[J].WorldofComputerScienceandInformationTechnologyJournal,2013,3(3):65-69.
  • 2VAVILAPALLIVK,MURTHYAC,DOUGLASC,etal.ApachehadoopYARN:yetanotherresourcenegotiator[C]//Procofthe4thACMSymposiumonCloudComputing.NewYork:ACMPress,2013.
  • 3DEANJ,GHEMAWATS.MapReduce:simplifieddataprocessingonlargeclusters[J].CommunicationsoftheACM,2008,51(1):107-113.
  • 4CHENChenwu,CHENPochen,CHIANGWeiliang.Modifiedintelligentgeneticalgorithmbasedadaptiveneuralnetworkcontrolforuncertainstructuralsystems[J].JournalofVibrationandControl,2013,19(9):1333-1347.
  • 5赵龙,江荣安.基于Hive的海量搜索日志分析系统研究[J].计算机应用研究,2013,30(11):3343-3345. 被引量:15
  • 6程苗,陈华平.基于Hadoop的Web日志挖掘[J].计算机工程,2011,37(11):37-39. 被引量:64
  • 7PANDITA,DESHPANDEA,KARMARKARP.LogminingbasedonHadoop’smapandreducetechnique[J].InternationalJournalonComputerScienceandEngineering,2013,5(4):270-274.
  • 8何翔,李仁发,唐卓.一种异构环境下的基于MapReduce任务调度改进机制[J].计算机应用研究,2013,30(11):3370-3373. 被引量:8
  • 9RONGZhen,TANGYan,LIUSu.ResearchonWeblogmining[C]//ProcofInternationalConferenceonInformationEngineeringandApplications.London:Springer2013:849-856.

二级参考文献24

  • 1王文平,刘希玉,韩杰.基于并行遗传算法的关联规则挖掘[J].山东师范大学学报(自然科学版),2006,21(4):29-31. 被引量:7
  • 2余慧佳,刘奕群,张敏,茹立云,马少平.基于大规模日志分析的搜索引擎用户行为分析[J].中文信息学报,2007,21(1):109-114. 被引量:117
  • 3Savasere A,Omiecinski E,Navathe S.An Efficient Algorithm for Mining Association Rules in Large Databases[C] //Proceedings of the 21st VLDB Conference.Zurich,Switzerland:[s.n.] ,1995:432-444.
  • 4COCKBURN A,JONES S. Which way now? Analyzing and easing in- adequacies in WWW navigation [ J ]. International Journal of Hu- man-Computer Studies, 1996,45 ( 1 ) : 105-129.
  • 5SILVERSTEIN C,MARAIS H, HENZINGEDR M,et al. Analysis of a very large Web search engine query log [ J ]. ACM SIGIR Forum, !998,33 ( 1 ) :6-12.
  • 6DEAN J, GHEMAWAT S. MapReduce:symplified data processing on large clusters [ J ]. Communications of the ACM, 2008,51 ( 1 ) : 107-113.
  • 7BRYANT R E. Data intensive supercomputing: the case for DISC, CMU technical report CMU-CS- 07-128 [ R]. Pittsburgh: Department of Computer Science, Carnegie Mellon University,2007.
  • 8PAVLO A,PAULSON E,RASIN A,et al. A comparison of approaches to large-scale data analysis [ C ]//Proc of SIGMOD International Conference on Management of Data. New York :ACM Press ,2009:165-178.
  • 9DEAN J,GHEMAWAT .S. MapReduce : simplified data processing on large clusters[ C ]//Proc of the 6th Conference on Operating Systems De- sign & Implementation. Berkeley: USENIX Association ,21304:137-150.
  • 10Apache Hadoop [ EB/OL ]. [ 2009 - 03- 06 ]. http://hadoop, apache. otg/.

共引文献83

同被引文献15

  • 12015年政府工作报告[R/OL].2015-03-17.http://poli-tics.people.tom.cn/n/2015/0317/e1024-26702211.html.
  • 2互联网+[EB/OL].2015.http://baike.baidu.com/link?url=2011GGEjBsYHE6Xxe5k8yOfEQ-Krj7WfuUwE8CLoDUL90AtWDRCbsFbFmSKAM7ukwAANlQmHyhVgwx7JQ8La.
  • 3Dean J, Ghemawat S. MapReduce:simplified data processing on large clusters [ J ]. Communications of the ACM, 2008,51(1) :107-113.
  • 4Apache Hadoop NextGen MapReduce (YARN)[ EB/OL]. 2014-06-21. http://hadoop, apache, org/docs/r2.4. 1/ha- doop -yarn/hadoop-yam-site/YARN. html.
  • 5Hadoop : writing YARN applications [ EB/OL ]. 2015 -06-29. http://hadoop, apache, org/docs/current/hadoop- yarn/ha- doop -yam - site/WfitingYamApplications, html.
  • 6MapReduce tutorial [ EB/OL]. 2015 -06-29. http ://hadoop. apache, org/docs/current/hadoop- mapreduce - client/hadoop -mapreduce-client -core/MapReduceTutorial. html.
  • 7HDFS users guide[ EB/OL]. 2015-06-29. http ://hadoop. a- pache, org/docs/current/hadoop- project- dist/hadoop- hdfs/ HdfsUserGuide. html.
  • 8国家质量监督检验检疫总局.GB/T22388-2008,原料乳与乳制品中三聚氰胺检测方法[s].北京:国家质量监督检验检疫总局,2008.
  • 9中华人民共和国卫生部.GB4789.4-2010,食品安全国家标准食品微生物学检验沙门氏菌检验[s].北京:中华人民共和国卫生部,2010.
  • 10国家质量监督检验检疫总局.GB8372-2008,牙膏[s].北京:国家质量监督检验检疫总局,2008.

引证文献2

二级引证文献18

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部