Large-scale data emerge in food safety inspection and testing industry with the development of Internet technology in China.This paper was aimed at designing toxic and hazardous substance big data risk analysis algori...Large-scale data emerge in food safety inspection and testing industry with the development of Internet technology in China.This paper was aimed at designing toxic and hazardous substance big data risk analysis algorithm in food safety inspection and testing based on cloud computing^([1]).Cloud computing platform was set up to store the massive extensive data with geographical distribution,dynamic and high complexity from the Internet,and MapReduce^([2]) computational framework in cloud computing was applied to process and compute parallel data.The risk analysis results were obtained by analyzing 1000000 meat products testing data collected from the laboratory management information system based on web.The results show that food safety index IFS < 1,which proves that the food safety state is in good condition.展开更多
Offiine network traffic analysis is very important for an in-depth study upon the understanding of network conditions and characteristics, such as user behavior and abnormal traffic. With the rapid growth of the amoun...Offiine network traffic analysis is very important for an in-depth study upon the understanding of network conditions and characteristics, such as user behavior and abnormal traffic. With the rapid growth of the amount of information on the Intemet, the traditional stand-alone analysis tools face great challenges in storage capacity and computing efficiency, but which is the advantages for Hadoop cluster. In this paper, we designed an offiine traffic analysis system based on Hadoop (OTASH), and proposed a MapReduce-based algorithm for TopN user statistics. In addition, we studied the computing performance and failure tolerance in OTASH. From the experiments we drew the conclusion that OTASH is suitable for handling large amounts of flow data, and are competent to calculate in the case of single node failure.展开更多
文摘Large-scale data emerge in food safety inspection and testing industry with the development of Internet technology in China.This paper was aimed at designing toxic and hazardous substance big data risk analysis algorithm in food safety inspection and testing based on cloud computing^([1]).Cloud computing platform was set up to store the massive extensive data with geographical distribution,dynamic and high complexity from the Internet,and MapReduce^([2]) computational framework in cloud computing was applied to process and compute parallel data.The risk analysis results were obtained by analyzing 1000000 meat products testing data collected from the laboratory management information system based on web.The results show that food safety index IFS < 1,which proves that the food safety state is in good condition.
文摘针对Hadoop开源云计算平台下MapReduce并行编程模型中间数据分配不均衡的问题,提出基于抽样的改进型MapReduce模型,即SMR(Sample MapReduce)模型.SMR模型采用MapReduce作业方式对各分块数据集进行并行抽样,基于抽样结果,利用LAB(leen and balance)均衡算法对Map端输出的中间数据进行均衡分配,以改善Reduce端处理数据负载不均衡问题.实验结果表明:改进型MapReduce模型可以有效减少作业运行时间,Reduce端输入数据达到负载均衡.
基金supported by the Important National Science & Technology Specific Projects (2012ZX03002008)the National Natural Science Foundation of China (61072061)The Fundamental Research Funds for the Central Universities (2012RC0121)
文摘Offiine network traffic analysis is very important for an in-depth study upon the understanding of network conditions and characteristics, such as user behavior and abnormal traffic. With the rapid growth of the amount of information on the Intemet, the traditional stand-alone analysis tools face great challenges in storage capacity and computing efficiency, but which is the advantages for Hadoop cluster. In this paper, we designed an offiine traffic analysis system based on Hadoop (OTASH), and proposed a MapReduce-based algorithm for TopN user statistics. In addition, we studied the computing performance and failure tolerance in OTASH. From the experiments we drew the conclusion that OTASH is suitable for handling large amounts of flow data, and are competent to calculate in the case of single node failure.