Cloud computing is very useful for big data owner who doesn't want to manage IT infrastructure and big data technique details. However, it is hard for big data owner to trust multi-layer outsourced big data system...Cloud computing is very useful for big data owner who doesn't want to manage IT infrastructure and big data technique details. However, it is hard for big data owner to trust multi-layer outsourced big data system in cloud environment and to verify which outsourced service leads to the problem. Similarly, the cloud service provider cannot simply trust the data computation applications. At last,the verification data itself may also leak the sensitive information from the cloud service provider and data owner. We propose a new three-level definition of the verification, threat model, corresponding trusted policies based on different roles for outsourced big data system in cloud. We also provide two policy enforcement methods for building trusted data computation environment by measuring both the Map Reduce application and its behaviors based on trusted computing and aspect-oriented programming. To prevent sensitive information leakage from verification process,we provide a privacy-preserved verification method. Finally, we implement the TPTVer, a Trusted third Party based Trusted Verifier as a proof of concept system. Our evaluation and analysis show that TPTVer can provide trusted verification for multi-layered outsourced big data system in the cloud with low overhead.展开更多
In the security and privacy fields,Access Control(AC)systems are viewed as the fundamental aspects of networking security mechanisms.Enforcing AC becomes even more challenging when researchers and data analysts have t...In the security and privacy fields,Access Control(AC)systems are viewed as the fundamental aspects of networking security mechanisms.Enforcing AC becomes even more challenging when researchers and data analysts have to analyze complex and distributed Big Data(BD)processing cluster frameworks,which are adopted to manage yottabyte of unstructured sensitive data.For instance,Big Data systems’privacy and security restrictions are most likely to failure due to the malformed AC policy configurations.Furthermore,BD systems were initially developed toped to take care of some of the DB issues to address BD challenges and many of these dealt with the“three Vs”(Velocity,Volume,and Variety)attributes,without planning security consideration,which are considered to be patch work.Some of the BD“three Vs”characteristics,such as distributed computing,fragment,redundant data and node-to node communication,each with its own security challenges,complicate even more the applicability of AC in BD.This paper gives an overview of the latest security and privacy challenges in BD AC systems.Furthermore,it analyzes and compares some of the latest AC research frameworks to reduce privacy and security issues in distributed BD systems,which very few enforce AC in a cost-effective and in a timely manner.Moreover,this work discusses some of the future research methodologies and improvements for BD AC systems.This study is valuable asset for Artificial Intelligence(AI)researchers,DB developers and DB analysts who need the latest AC security and privacy research perspective before using and/or improving a current BD AC framework.展开更多
With the growth of distributed computing systems, the modern Big Data analysis platform products often have diversified characteristics. It is hard for users to make decisions when they are in early contact with Big D...With the growth of distributed computing systems, the modern Big Data analysis platform products often have diversified characteristics. It is hard for users to make decisions when they are in early contact with Big Data platforms. In this paper, we discussed the design principles and research directions of modern Big Data platforms by presenting research in modern Big Data products. We provided a detailed review and comparison of several state-ofthe-art frameworks and concluded into a typical structure with five horizontal and one vertical. According to this structure, this paper presents the components and modern optimization technologies developed for Big Data, which helps to choose the most suitable components and architecture from various Big Data technologies based on requirements.展开更多
Purpose:We propose In Par Ten2,a multi-aspect parallel factor analysis three-dimensional tensor decomposition algorithm based on the Apache Spark framework.The proposed method reduces re-decomposition cost and can han...Purpose:We propose In Par Ten2,a multi-aspect parallel factor analysis three-dimensional tensor decomposition algorithm based on the Apache Spark framework.The proposed method reduces re-decomposition cost and can handle large tensors.Design/methodology/approach:Considering that tensor addition increases the size of a given tensor along all axes,the proposed method decomposes incoming tensors using existing decomposition results without generating sub-tensors.Additionally,In Par Ten2 avoids the calculation of Khari–Rao products and minimizes shuffling by using the Apache Spark platform.Findings:The performance of In Par Ten2 is evaluated by comparing its execution time and accuracy with those of existing distributed tensor decomposition methods on various datasets.The results confirm that In Par Ten2 can process large tensors and reduce the re-calculation cost of tensor decomposition.Consequently,the proposed method is faster than existing tensor decomposition algorithms and can significantly reduce re-decomposition cost.Research limitations:There are several Hadoop-based distributed tensor decomposition algorithms as well as MATLAB-based decomposition methods.However,the former require longer iteration time,and therefore their execution time cannot be compared with that of Spark-based algorithms,whereas the latter run on a single machine,thus limiting their ability to handle large data.Practical implications:The proposed algorithm can reduce re-decomposition cost when tensors are added to a given tensor by decomposing them based on existing decomposition results without re-decomposing the entire tensor.Originality/value:The proposed method can handle large tensors and is fast within the limited-memory framework of Apache Spark.Moreover,In Par Ten2 can handle static as well as incremental tensor decomposition.展开更多
There are challenges to the reliability evaluation for insulated gate bipolar transistors(IGBT)on electric vehicles,such as junction temperature measurement,computational and storage resources.In this paper,a junction...There are challenges to the reliability evaluation for insulated gate bipolar transistors(IGBT)on electric vehicles,such as junction temperature measurement,computational and storage resources.In this paper,a junction temperature estimation approach based on neural network without additional cost is proposed and the lifetime calculation for IGBT using electric vehicle big data is performed.The direct current(DC)voltage,operation current,switching frequency,negative thermal coefficient thermistor(NTC)temperature and IGBT lifetime are inputs.And the junction temperature(T_(j))is output.With the rain flow counting method,the classified irregular temperatures are brought into the life model for the failure cycles.The fatigue accumulation method is then used to calculate the IGBT lifetime.To solve the limited computational and storage resources of electric vehicle controllers,the operation of IGBT lifetime calculation is running on a big data platform.The lifetime is then transmitted wirelessly to electric vehicles as input for neural network.Thus the junction temperature of IGBT under long-term operating conditions can be accurately estimated.A test platform of the motor controller combined with the vehicle big data server is built for the IGBT accelerated aging test.Subsequently,the IGBT lifetime predictions are derived from the junction temperature estimation by the neural network method and the thermal network method.The experiment shows that the lifetime prediction based on a neural network with big data demonstrates a higher accuracy than that of the thermal network,which improves the reliability evaluation of system.展开更多
In order to address the problems of the single encryption algorithm,such as low encryption efficiency and unreliable metadata for static data storage of big data platforms in the cloud computing environment,we propose...In order to address the problems of the single encryption algorithm,such as low encryption efficiency and unreliable metadata for static data storage of big data platforms in the cloud computing environment,we propose a Hadoop based big data secure storage scheme.Firstly,in order to disperse the NameNode service from a single server to multiple servers,we combine HDFS federation and HDFS high-availability mechanisms,and use the Zookeeper distributed coordination mechanism to coordinate each node to achieve dual-channel storage.Then,we improve the ECC encryption algorithm for the encryption of ordinary data,and adopt a homomorphic encryption algorithm to encrypt data that needs to be calculated.To accelerate the encryption,we adopt the dualthread encryption mode.Finally,the HDFS control module is designed to combine the encryption algorithm with the storage model.Experimental results show that the proposed solution solves the problem of a single point of failure of metadata,performs well in terms of metadata reliability,and can realize the fault tolerance of the server.The improved encryption algorithm integrates the dual-channel storage mode,and the encryption storage efficiency improves by 27.6% on average.展开更多
COVID-19 posed challenges for global tourism management.Changes in visitor temporal and spatial patterns and their associated determinants pre-and peri-pandemic in Canadian Rocky Mountain National Parks are analyzed.D...COVID-19 posed challenges for global tourism management.Changes in visitor temporal and spatial patterns and their associated determinants pre-and peri-pandemic in Canadian Rocky Mountain National Parks are analyzed.Data was collected through social media programming and analyzed using spatiotemporal analysis and a geographically weighted regression(GWR)model.Results highlight that COVID-19 significantly changed park visitation patterns.Visitors tended to explore more remote areas peri-pandemic.The GWR model also indicated distance to nearby trails was a significant influence on visitor density.Our results indicate that the pandemic influenced tourism temporal and spatial imbalance.This research presents a novel approach using combined social media big data which can be extended to the field of tourism management,and has important implications to manage visitor patterns and to allocate resources efficiently to satisfy multiple objectives of park management.展开更多
Due to the restricted satellite payloads in LEO mega-constellation networks(LMCNs),remote sensing image analysis,online learning and other big data services desirably need onboard distributed processing(OBDP).In exist...Due to the restricted satellite payloads in LEO mega-constellation networks(LMCNs),remote sensing image analysis,online learning and other big data services desirably need onboard distributed processing(OBDP).In existing technologies,the efficiency of big data applications(BDAs)in distributed systems hinges on the stable-state and low-latency links between worker nodes.However,LMCNs with high-dynamic nodes and long-distance links can not provide the above conditions,which makes the performance of OBDP hard to be intuitively measured.To bridge this gap,a multidimensional simulation platform is indispensable that can simulate the network environment of LMCNs and put BDAs in it for performance testing.Using STK's APIs and parallel computing framework,we achieve real-time simulation for thousands of satellite nodes,which are mapped as application nodes through software defined network(SDN)and container technologies.We elaborate the architecture and mechanism of the simulation platform,and take the Starlink and Hadoop as realistic examples for simulations.The results indicate that LMCNs have dynamic end-to-end latency which fluctuates periodically with the constellation movement.Compared to ground data center networks(GDCNs),LMCNs deteriorate the computing and storage job throughput,which can be alleviated by the utilization of erasure codes and data flow scheduling of worker nodes.展开更多
As big data becomes an apparent challenge to handle when building a business intelligence(BI)system,there is a motivation to handle this challenging issue in higher education institutions(HEIs).Monitoring quality in H...As big data becomes an apparent challenge to handle when building a business intelligence(BI)system,there is a motivation to handle this challenging issue in higher education institutions(HEIs).Monitoring quality in HEIs encompasses handling huge amounts of data coming from different sources.This paper reviews big data and analyses the cases from the literature regarding quality assurance(QA)in HEIs.It also outlines a framework that can address the big data challenge in HEIs to handle QA monitoring using BI dashboards and a prototype dashboard is presented in this paper.The dashboard was developed using a utilisation tool to monitor QA in HEIs to provide visual representations of big data.The prototype dashboard enables stakeholders to monitor compliance with QA standards while addressing the big data challenge associated with the substantial volume of data managed by HEIs’QA systems.This paper also outlines how the developed system integrates big data from social media into the monitoring dashboard.展开更多
Big data resources are characterized by large scale, wide sources, and strong dynamics. Existing access controlmechanisms based on manual policy formulation by security experts suffer from drawbacks such as low policy...Big data resources are characterized by large scale, wide sources, and strong dynamics. Existing access controlmechanisms based on manual policy formulation by security experts suffer from drawbacks such as low policymanagement efficiency and difficulty in accurately describing the access control policy. To overcome theseproblems, this paper proposes a big data access control mechanism based on a two-layer permission decisionstructure. This mechanism extends the attribute-based access control (ABAC) model. Business attributes areintroduced in the ABAC model as business constraints between entities. The proposed mechanism implementsa two-layer permission decision structure composed of the inherent attributes of access control entities and thebusiness attributes, which constitute the general permission decision algorithm based on logical calculation andthe business permission decision algorithm based on a bi-directional long short-term memory (BiLSTM) neuralnetwork, respectively. The general permission decision algorithm is used to implement accurate policy decisions,while the business permission decision algorithm implements fuzzy decisions based on the business constraints.The BiLSTM neural network is used to calculate the similarity of the business attributes to realize intelligent,adaptive, and efficient access control permission decisions. Through the two-layer permission decision structure,the complex and diverse big data access control management requirements can be satisfied by considering thesecurity and availability of resources. Experimental results show that the proposed mechanism is effective andreliable. In summary, it can efficiently support the secure sharing of big data resources.展开更多
Big data analytics has been widely adopted by large companies to achieve measurable benefits including increased profitability,customer demand forecasting,cheaper development of products,and improved stock control.Sma...Big data analytics has been widely adopted by large companies to achieve measurable benefits including increased profitability,customer demand forecasting,cheaper development of products,and improved stock control.Small and medium sized enterprises(SMEs)are the backbone of the global economy,comprising of 90%of businesses worldwide.However,only 10%SMEs have adopted big data analytics despite the competitive advantage they could achieve.Previous research has analysed the barriers to adoption and a strategic framework has been developed to help SMEs adopt big data analytics.The framework was converted into a scoring tool which has been applied to multiple case studies of SMEs in the UK.This paper documents the process of evaluating the framework based on the structured feedback from a focus group composed of experienced practitioners.The results of the evaluation are presented with a discussion on the results,and the paper concludes with recommendations to improve the scoring tool based on the proposed framework.The research demonstrates that this positioning tool is beneficial for SMEs to achieve competitive advantages by increasing the application of business intelligence and big data analytics.展开更多
Mobile networks possess significant information and thus are considered a gold mine for the researcher’s community.The call detail records(CDR)of a mobile network are used to identify the network’s efficacy and the ...Mobile networks possess significant information and thus are considered a gold mine for the researcher’s community.The call detail records(CDR)of a mobile network are used to identify the network’s efficacy and the mobile user’s behavior.It is evident from the recent literature that cyber-physical systems(CPS)were used in the analytics and modeling of telecom data.In addition,CPS is used to provide valuable services in smart cities.In general,a typical telecom company hasmillions of subscribers and thus generatesmassive amounts of data.From this aspect,data storage,analysis,and processing are the key concerns.To solve these issues,herein we propose a multilevel cyber-physical social system(CPSS)for the analysis and modeling of large internet data.Our proposed multilevel system has three levels and each level has a specific functionality.Initially,raw Call Detail Data(CDR)was collected at the first level.Herein,the data preprocessing,cleaning,and error removal operations were performed.In the second level,data processing,cleaning,reduction,integration,processing,and storage were performed.Herein,suggested internet activity record measures were applied.Our proposed system initially constructs a graph and then performs network analysis.Thus proposed CPSS system accurately identifies different areas of internet peak usage in a city(Milan city).Our research is helpful for the network operators to plan effective network configuration,management,and optimization of resources.展开更多
Genome-wide association mapping studies(GWAS)based on Big Data are a potential approach to improve marker-assisted selection in plant breeding.The number of available phenotypic and genomic data sets in which medium-s...Genome-wide association mapping studies(GWAS)based on Big Data are a potential approach to improve marker-assisted selection in plant breeding.The number of available phenotypic and genomic data sets in which medium-sized populations of several hundred individuals have been studied is rapidly increasing.Combining these data and using them in GWAS could increase both the power of QTL discovery and the accuracy of estimation of underlying genetic effects,but is hindered by data heterogeneity and lack of interoperability.In this study,we used genomic and phenotypic data sets,focusing on Central European winter wheat populations evaluated for heading date.We explored strategies for integrating these data and subsequently the resulting potential for GWAS.Establishing interoperability between data sets was greatly aided by some overlapping genotypes and a linear relationship between the different phenotyping protocols,resulting in high quality integrated phenotypic data.In this context,genomic prediction proved to be a suitable tool to study relevance of interactions between genotypes and experimental series,which was low in our case.Contrary to expectations,fewer associations between markers and traits were found in the larger combined data than in the individual experimental series.However,the predictive power based on the marker-trait associations of the integrated data set was higher across data sets.Therefore,the results show that the integration of medium-sized to Big Data is an approach to increase the power to detect QTL in GWAS.The results encourage further efforts to standardize and share data in the plant breeding community.展开更多
The development of technologies such as big data and blockchain has brought convenience to life,but at the same time,privacy and security issues are becoming more and more prominent.The K-anonymity algorithm is an eff...The development of technologies such as big data and blockchain has brought convenience to life,but at the same time,privacy and security issues are becoming more and more prominent.The K-anonymity algorithm is an effective and low computational complexity privacy-preserving algorithm that can safeguard users’privacy by anonymizing big data.However,the algorithm currently suffers from the problem of focusing only on improving user privacy while ignoring data availability.In addition,ignoring the impact of quasi-identified attributes on sensitive attributes causes the usability of the processed data on statistical analysis to be reduced.Based on this,we propose a new K-anonymity algorithm to solve the privacy security problem in the context of big data,while guaranteeing improved data usability.Specifically,we construct a new information loss function based on the information quantity theory.Considering that different quasi-identification attributes have different impacts on sensitive attributes,we set weights for each quasi-identification attribute when designing the information loss function.In addition,to reduce information loss,we improve K-anonymity in two ways.First,we make the loss of information smaller than in the original table while guaranteeing privacy based on common artificial intelligence algorithms,i.e.,greedy algorithm and 2-means clustering algorithm.In addition,we improve the 2-means clustering algorithm by designing a mean-center method to select the initial center of mass.Meanwhile,we design the K-anonymity algorithm of this scheme based on the constructed information loss function,the improved 2-means clustering algorithm,and the greedy algorithm,which reduces the information loss.Finally,we experimentally demonstrate the effectiveness of the algorithm in improving the effect of 2-means clustering and reducing information loss.展开更多
As technology and the internet develop,more data are generated every day.These data are in large sizes,high dimensions,and complex structures.The combination of these three features is the“Big Data”[1].Big data is r...As technology and the internet develop,more data are generated every day.These data are in large sizes,high dimensions,and complex structures.The combination of these three features is the“Big Data”[1].Big data is revolutionizing all industries,bringing colossal impacts to them[2].Many researchers have pointed out the huge impact that big data can have on our daily lives[3].We can utilize the information we obtain and help us make decisions.Also,the conclusions we drew from the big data we analyzed can be used as a prediction for the future,helping us to make more accurate and benign decisions earlier than others.If we apply these technics in finance,for example,in stock,we can get detailed information for stocks.Moreover,we can use the analyzed data to predict certain stocks.This can help people decide whether to buy a stock or not by providing predicted data for people at a certain convincing level,helping to protect them from potential losses.展开更多
This research paper has provided the methodology and design for implementing the hybrid author recommender system using Azure Data Lake Analytics and Power BI. It offers a recommendation for the top 1000 Authors of co...This research paper has provided the methodology and design for implementing the hybrid author recommender system using Azure Data Lake Analytics and Power BI. It offers a recommendation for the top 1000 Authors of computer science in different fields of study. The technique used in this paper is handling the inadequate Information for citation;it removes the problem of cold start, which is encountered by very many other recommender systems. In this paper, abstracts, the titles, and the Microsoft academic graphs have been used in coming up with the recommendation list for every document, which is used to combine the content-based approaches and the co-citations. Prioritization and the blending of every technique have been allowed by the tuning system parameters, allowing for the authority in results of recommendation versus the paper novelty. In the end, we do observe that there is a direct correlation between the similarity rankings that have been produced by the system and the scores of the participant. The results coming from the associated scrips of analysis and the user survey have been made available through the recommendation system. Managers must gain the required expertise to fully utilize the benefits that come with business intelligence systems [1]. Data mining has become an important tool for managers that provides insights about their daily operations and leverage the information provided by decision support systems to improve customer relationships [2]. Additionally, managers require business intelligence systems that can rank the output in the order of priority. Ranking algorithm can replace the traditional data mining algorithms that will be discussed in-depth in the literature review [3].展开更多
This article delves into the intricate relationship between big data, cloud computing, and artificial intelligence, shedding light on their fundamental attributes and interdependence. It explores the seamless amalgama...This article delves into the intricate relationship between big data, cloud computing, and artificial intelligence, shedding light on their fundamental attributes and interdependence. It explores the seamless amalgamation of AI methodologies within cloud computing and big data analytics, encompassing the development of a cloud computing framework built on the robust foundation of the Hadoop platform, enriched by AI learning algorithms. Additionally, it examines the creation of a predictive model empowered by tailored artificial intelligence techniques. Rigorous simulations are conducted to extract valuable insights, facilitating method evaluation and performance assessment, all within the dynamic Hadoop environment, thereby reaffirming the precision of the proposed approach. The results and analysis section reveals compelling findings derived from comprehensive simulations within the Hadoop environment. These outcomes demonstrate the efficacy of the Sport AI Model (SAIM) framework in enhancing the accuracy of sports-related outcome predictions. Through meticulous mathematical analyses and performance assessments, integrating AI with big data emerges as a powerful tool for optimizing decision-making in sports. The discussion section extends the implications of these results, highlighting the potential for SAIM to revolutionize sports forecasting, strategic planning, and performance optimization for players and coaches. The combination of big data, cloud computing, and AI offers a promising avenue for future advancements in sports analytics. This research underscores the synergy between these technologies and paves the way for innovative approaches to sports-related decision-making and performance enhancement.展开更多
This article discusses the current status and development strategies of computer science and technology in the context of big data.Firstly,it explains the relationship between big data and computer science and technol...This article discusses the current status and development strategies of computer science and technology in the context of big data.Firstly,it explains the relationship between big data and computer science and technology,focusing on analyzing the current application status of computer science and technology in big data,including data storage,data processing,and data analysis.Then,it proposes development strategies for big data processing.Computer science and technology play a vital role in big data processing by providing strong technical support.展开更多
Contemporary mainstream big data governance platforms are built atop the big data ecosystem components,offering a one-stop development and analysis governance platform for the collection,transmission,storage,cleansing...Contemporary mainstream big data governance platforms are built atop the big data ecosystem components,offering a one-stop development and analysis governance platform for the collection,transmission,storage,cleansing,transformation,querying and analysis,data development,publishing,and subscription,sharing and exchange,management,and services of massive data.These platforms serve various role members who have internal and external data needs.However,in the era of big data,the rapid update and iteration of big data technologies,the diversification of data businesses,and the exponential growth of data present more challenges and uncertainties to the construction of big data governance platforms.This paper discusses how to effectively build a data governance platform under the big data system from the perspectives of functional architecture,logical architecture,data architecture,and functional design.展开更多
Big data finds extensive application and many fields.It brings new opportunities for the development of agriculture.Using big data technology to promote the development of smart agriculture can greatly improve the eff...Big data finds extensive application and many fields.It brings new opportunities for the development of agriculture.Using big data technology to promote the development of smart agriculture can greatly improve the effect of agricultural planting,reduce the input of manpower and material resources,and lay a solid foundation for the realization of agricultural modernization.In this regard,this paper briefly analyzes the construction and application of smart agriculture based on big data technology,hoping to provide some valuable insights for readers.展开更多
基金partially supported by grants from the China 863 High-tech Program (Grant No. 2015AA016002)the Specialized Research Fund for the Doctoral Program of Higher Education (Grant No. 20131103120001)+2 种基金the National Key Research and Development Program of China (Grant No. 2016YFB0800204)the National Science Foundation of China (No. 61502017)the Scientific Research Common Program of Beijing Municipal Commission of Education (KM201710005024)
文摘Cloud computing is very useful for big data owner who doesn't want to manage IT infrastructure and big data technique details. However, it is hard for big data owner to trust multi-layer outsourced big data system in cloud environment and to verify which outsourced service leads to the problem. Similarly, the cloud service provider cannot simply trust the data computation applications. At last,the verification data itself may also leak the sensitive information from the cloud service provider and data owner. We propose a new three-level definition of the verification, threat model, corresponding trusted policies based on different roles for outsourced big data system in cloud. We also provide two policy enforcement methods for building trusted data computation environment by measuring both the Map Reduce application and its behaviors based on trusted computing and aspect-oriented programming. To prevent sensitive information leakage from verification process,we provide a privacy-preserved verification method. Finally, we implement the TPTVer, a Trusted third Party based Trusted Verifier as a proof of concept system. Our evaluation and analysis show that TPTVer can provide trusted verification for multi-layered outsourced big data system in the cloud with low overhead.
文摘In the security and privacy fields,Access Control(AC)systems are viewed as the fundamental aspects of networking security mechanisms.Enforcing AC becomes even more challenging when researchers and data analysts have to analyze complex and distributed Big Data(BD)processing cluster frameworks,which are adopted to manage yottabyte of unstructured sensitive data.For instance,Big Data systems’privacy and security restrictions are most likely to failure due to the malformed AC policy configurations.Furthermore,BD systems were initially developed toped to take care of some of the DB issues to address BD challenges and many of these dealt with the“three Vs”(Velocity,Volume,and Variety)attributes,without planning security consideration,which are considered to be patch work.Some of the BD“three Vs”characteristics,such as distributed computing,fragment,redundant data and node-to node communication,each with its own security challenges,complicate even more the applicability of AC in BD.This paper gives an overview of the latest security and privacy challenges in BD AC systems.Furthermore,it analyzes and compares some of the latest AC research frameworks to reduce privacy and security issues in distributed BD systems,which very few enforce AC in a cost-effective and in a timely manner.Moreover,this work discusses some of the future research methodologies and improvements for BD AC systems.This study is valuable asset for Artificial Intelligence(AI)researchers,DB developers and DB analysts who need the latest AC security and privacy research perspective before using and/or improving a current BD AC framework.
基金supported by the Research Fund of Tencent Computer System Co.Ltd.under Grant No.170125
文摘With the growth of distributed computing systems, the modern Big Data analysis platform products often have diversified characteristics. It is hard for users to make decisions when they are in early contact with Big Data platforms. In this paper, we discussed the design principles and research directions of modern Big Data platforms by presenting research in modern Big Data products. We provided a detailed review and comparison of several state-ofthe-art frameworks and concluded into a typical structure with five horizontal and one vertical. According to this structure, this paper presents the components and modern optimization technologies developed for Big Data, which helps to choose the most suitable components and architecture from various Big Data technologies based on requirements.
基金supported by Basic Science Research Program through the National Research Foundation of Korea(NRF)funded by the Ministry of Education(NRF-2016R1D1A1B03931529)。
文摘Purpose:We propose In Par Ten2,a multi-aspect parallel factor analysis three-dimensional tensor decomposition algorithm based on the Apache Spark framework.The proposed method reduces re-decomposition cost and can handle large tensors.Design/methodology/approach:Considering that tensor addition increases the size of a given tensor along all axes,the proposed method decomposes incoming tensors using existing decomposition results without generating sub-tensors.Additionally,In Par Ten2 avoids the calculation of Khari–Rao products and minimizes shuffling by using the Apache Spark platform.Findings:The performance of In Par Ten2 is evaluated by comparing its execution time and accuracy with those of existing distributed tensor decomposition methods on various datasets.The results confirm that In Par Ten2 can process large tensors and reduce the re-calculation cost of tensor decomposition.Consequently,the proposed method is faster than existing tensor decomposition algorithms and can significantly reduce re-decomposition cost.Research limitations:There are several Hadoop-based distributed tensor decomposition algorithms as well as MATLAB-based decomposition methods.However,the former require longer iteration time,and therefore their execution time cannot be compared with that of Spark-based algorithms,whereas the latter run on a single machine,thus limiting their ability to handle large data.Practical implications:The proposed algorithm can reduce re-decomposition cost when tensors are added to a given tensor by decomposing them based on existing decomposition results without re-decomposing the entire tensor.Originality/value:The proposed method can handle large tensors and is fast within the limited-memory framework of Apache Spark.Moreover,In Par Ten2 can handle static as well as incremental tensor decomposition.
文摘There are challenges to the reliability evaluation for insulated gate bipolar transistors(IGBT)on electric vehicles,such as junction temperature measurement,computational and storage resources.In this paper,a junction temperature estimation approach based on neural network without additional cost is proposed and the lifetime calculation for IGBT using electric vehicle big data is performed.The direct current(DC)voltage,operation current,switching frequency,negative thermal coefficient thermistor(NTC)temperature and IGBT lifetime are inputs.And the junction temperature(T_(j))is output.With the rain flow counting method,the classified irregular temperatures are brought into the life model for the failure cycles.The fatigue accumulation method is then used to calculate the IGBT lifetime.To solve the limited computational and storage resources of electric vehicle controllers,the operation of IGBT lifetime calculation is running on a big data platform.The lifetime is then transmitted wirelessly to electric vehicles as input for neural network.Thus the junction temperature of IGBT under long-term operating conditions can be accurately estimated.A test platform of the motor controller combined with the vehicle big data server is built for the IGBT accelerated aging test.Subsequently,the IGBT lifetime predictions are derived from the junction temperature estimation by the neural network method and the thermal network method.The experiment shows that the lifetime prediction based on a neural network with big data demonstrates a higher accuracy than that of the thermal network,which improves the reliability evaluation of system.
文摘In order to address the problems of the single encryption algorithm,such as low encryption efficiency and unreliable metadata for static data storage of big data platforms in the cloud computing environment,we propose a Hadoop based big data secure storage scheme.Firstly,in order to disperse the NameNode service from a single server to multiple servers,we combine HDFS federation and HDFS high-availability mechanisms,and use the Zookeeper distributed coordination mechanism to coordinate each node to achieve dual-channel storage.Then,we improve the ECC encryption algorithm for the encryption of ordinary data,and adopt a homomorphic encryption algorithm to encrypt data that needs to be calculated.To accelerate the encryption,we adopt the dualthread encryption mode.Finally,the HDFS control module is designed to combine the encryption algorithm with the storage model.Experimental results show that the proposed solution solves the problem of a single point of failure of metadata,performs well in terms of metadata reliability,and can realize the fault tolerance of the server.The improved encryption algorithm integrates the dual-channel storage mode,and the encryption storage efficiency improves by 27.6% on average.
基金This research was supported by the UBC APFNet Grant(Project ID:2022sp2 CAN).
文摘COVID-19 posed challenges for global tourism management.Changes in visitor temporal and spatial patterns and their associated determinants pre-and peri-pandemic in Canadian Rocky Mountain National Parks are analyzed.Data was collected through social media programming and analyzed using spatiotemporal analysis and a geographically weighted regression(GWR)model.Results highlight that COVID-19 significantly changed park visitation patterns.Visitors tended to explore more remote areas peri-pandemic.The GWR model also indicated distance to nearby trails was a significant influence on visitor density.Our results indicate that the pandemic influenced tourism temporal and spatial imbalance.This research presents a novel approach using combined social media big data which can be extended to the field of tourism management,and has important implications to manage visitor patterns and to allocate resources efficiently to satisfy multiple objectives of park management.
基金supported by National Natural Sciences Foundation of China(No.62271165,62027802,62201307)the Guangdong Basic and Applied Basic Research Foundation(No.2023A1515030297)+2 种基金the Shenzhen Science and Technology Program ZDSYS20210623091808025Stable Support Plan Program GXWD20231129102638002the Major Key Project of PCL(No.PCL2024A01)。
文摘Due to the restricted satellite payloads in LEO mega-constellation networks(LMCNs),remote sensing image analysis,online learning and other big data services desirably need onboard distributed processing(OBDP).In existing technologies,the efficiency of big data applications(BDAs)in distributed systems hinges on the stable-state and low-latency links between worker nodes.However,LMCNs with high-dynamic nodes and long-distance links can not provide the above conditions,which makes the performance of OBDP hard to be intuitively measured.To bridge this gap,a multidimensional simulation platform is indispensable that can simulate the network environment of LMCNs and put BDAs in it for performance testing.Using STK's APIs and parallel computing framework,we achieve real-time simulation for thousands of satellite nodes,which are mapped as application nodes through software defined network(SDN)and container technologies.We elaborate the architecture and mechanism of the simulation platform,and take the Starlink and Hadoop as realistic examples for simulations.The results indicate that LMCNs have dynamic end-to-end latency which fluctuates periodically with the constellation movement.Compared to ground data center networks(GDCNs),LMCNs deteriorate the computing and storage job throughput,which can be alleviated by the utilization of erasure codes and data flow scheduling of worker nodes.
文摘As big data becomes an apparent challenge to handle when building a business intelligence(BI)system,there is a motivation to handle this challenging issue in higher education institutions(HEIs).Monitoring quality in HEIs encompasses handling huge amounts of data coming from different sources.This paper reviews big data and analyses the cases from the literature regarding quality assurance(QA)in HEIs.It also outlines a framework that can address the big data challenge in HEIs to handle QA monitoring using BI dashboards and a prototype dashboard is presented in this paper.The dashboard was developed using a utilisation tool to monitor QA in HEIs to provide visual representations of big data.The prototype dashboard enables stakeholders to monitor compliance with QA standards while addressing the big data challenge associated with the substantial volume of data managed by HEIs’QA systems.This paper also outlines how the developed system integrates big data from social media into the monitoring dashboard.
基金Key Research and Development and Promotion Program of Henan Province(No.222102210069)Zhongyuan Science and Technology Innovation Leading Talent Project(224200510003)National Natural Science Foundation of China(No.62102449).
文摘Big data resources are characterized by large scale, wide sources, and strong dynamics. Existing access controlmechanisms based on manual policy formulation by security experts suffer from drawbacks such as low policymanagement efficiency and difficulty in accurately describing the access control policy. To overcome theseproblems, this paper proposes a big data access control mechanism based on a two-layer permission decisionstructure. This mechanism extends the attribute-based access control (ABAC) model. Business attributes areintroduced in the ABAC model as business constraints between entities. The proposed mechanism implementsa two-layer permission decision structure composed of the inherent attributes of access control entities and thebusiness attributes, which constitute the general permission decision algorithm based on logical calculation andthe business permission decision algorithm based on a bi-directional long short-term memory (BiLSTM) neuralnetwork, respectively. The general permission decision algorithm is used to implement accurate policy decisions,while the business permission decision algorithm implements fuzzy decisions based on the business constraints.The BiLSTM neural network is used to calculate the similarity of the business attributes to realize intelligent,adaptive, and efficient access control permission decisions. Through the two-layer permission decision structure,the complex and diverse big data access control management requirements can be satisfied by considering thesecurity and availability of resources. Experimental results show that the proposed mechanism is effective andreliable. In summary, it can efficiently support the secure sharing of big data resources.
文摘Big data analytics has been widely adopted by large companies to achieve measurable benefits including increased profitability,customer demand forecasting,cheaper development of products,and improved stock control.Small and medium sized enterprises(SMEs)are the backbone of the global economy,comprising of 90%of businesses worldwide.However,only 10%SMEs have adopted big data analytics despite the competitive advantage they could achieve.Previous research has analysed the barriers to adoption and a strategic framework has been developed to help SMEs adopt big data analytics.The framework was converted into a scoring tool which has been applied to multiple case studies of SMEs in the UK.This paper documents the process of evaluating the framework based on the structured feedback from a focus group composed of experienced practitioners.The results of the evaluation are presented with a discussion on the results,and the paper concludes with recommendations to improve the scoring tool based on the proposed framework.The research demonstrates that this positioning tool is beneficial for SMEs to achieve competitive advantages by increasing the application of business intelligence and big data analytics.
基金supported by Basic Science Research Program through the National Research Foundation of Korea(NRF)funded by the Ministry of Education(NRF-2021R1A6A1A03039493).
文摘Mobile networks possess significant information and thus are considered a gold mine for the researcher’s community.The call detail records(CDR)of a mobile network are used to identify the network’s efficacy and the mobile user’s behavior.It is evident from the recent literature that cyber-physical systems(CPS)were used in the analytics and modeling of telecom data.In addition,CPS is used to provide valuable services in smart cities.In general,a typical telecom company hasmillions of subscribers and thus generatesmassive amounts of data.From this aspect,data storage,analysis,and processing are the key concerns.To solve these issues,herein we propose a multilevel cyber-physical social system(CPSS)for the analysis and modeling of large internet data.Our proposed multilevel system has three levels and each level has a specific functionality.Initially,raw Call Detail Data(CDR)was collected at the first level.Herein,the data preprocessing,cleaning,and error removal operations were performed.In the second level,data processing,cleaning,reduction,integration,processing,and storage were performed.Herein,suggested internet activity record measures were applied.Our proposed system initially constructs a graph and then performs network analysis.Thus proposed CPSS system accurately identifies different areas of internet peak usage in a city(Milan city).Our research is helpful for the network operators to plan effective network configuration,management,and optimization of resources.
基金funding within the Wheat BigData Project(German Federal Ministry of Food and Agriculture,FKZ2818408B18)。
文摘Genome-wide association mapping studies(GWAS)based on Big Data are a potential approach to improve marker-assisted selection in plant breeding.The number of available phenotypic and genomic data sets in which medium-sized populations of several hundred individuals have been studied is rapidly increasing.Combining these data and using them in GWAS could increase both the power of QTL discovery and the accuracy of estimation of underlying genetic effects,but is hindered by data heterogeneity and lack of interoperability.In this study,we used genomic and phenotypic data sets,focusing on Central European winter wheat populations evaluated for heading date.We explored strategies for integrating these data and subsequently the resulting potential for GWAS.Establishing interoperability between data sets was greatly aided by some overlapping genotypes and a linear relationship between the different phenotyping protocols,resulting in high quality integrated phenotypic data.In this context,genomic prediction proved to be a suitable tool to study relevance of interactions between genotypes and experimental series,which was low in our case.Contrary to expectations,fewer associations between markers and traits were found in the larger combined data than in the individual experimental series.However,the predictive power based on the marker-trait associations of the integrated data set was higher across data sets.Therefore,the results show that the integration of medium-sized to Big Data is an approach to increase the power to detect QTL in GWAS.The results encourage further efforts to standardize and share data in the plant breeding community.
基金Foundation of National Natural Science Foundation of China(62202118)Scientific and Technological Research Projects from Guizhou Education Department([2023]003)+1 种基金Guizhou Provincial Department of Science and Technology Hundred Levels of Innovative Talents Project(GCC[2023]018)Top Technology Talent Project from Guizhou Education Department([2022]073).
文摘The development of technologies such as big data and blockchain has brought convenience to life,but at the same time,privacy and security issues are becoming more and more prominent.The K-anonymity algorithm is an effective and low computational complexity privacy-preserving algorithm that can safeguard users’privacy by anonymizing big data.However,the algorithm currently suffers from the problem of focusing only on improving user privacy while ignoring data availability.In addition,ignoring the impact of quasi-identified attributes on sensitive attributes causes the usability of the processed data on statistical analysis to be reduced.Based on this,we propose a new K-anonymity algorithm to solve the privacy security problem in the context of big data,while guaranteeing improved data usability.Specifically,we construct a new information loss function based on the information quantity theory.Considering that different quasi-identification attributes have different impacts on sensitive attributes,we set weights for each quasi-identification attribute when designing the information loss function.In addition,to reduce information loss,we improve K-anonymity in two ways.First,we make the loss of information smaller than in the original table while guaranteeing privacy based on common artificial intelligence algorithms,i.e.,greedy algorithm and 2-means clustering algorithm.In addition,we improve the 2-means clustering algorithm by designing a mean-center method to select the initial center of mass.Meanwhile,we design the K-anonymity algorithm of this scheme based on the constructed information loss function,the improved 2-means clustering algorithm,and the greedy algorithm,which reduces the information loss.Finally,we experimentally demonstrate the effectiveness of the algorithm in improving the effect of 2-means clustering and reducing information loss.
文摘As technology and the internet develop,more data are generated every day.These data are in large sizes,high dimensions,and complex structures.The combination of these three features is the“Big Data”[1].Big data is revolutionizing all industries,bringing colossal impacts to them[2].Many researchers have pointed out the huge impact that big data can have on our daily lives[3].We can utilize the information we obtain and help us make decisions.Also,the conclusions we drew from the big data we analyzed can be used as a prediction for the future,helping us to make more accurate and benign decisions earlier than others.If we apply these technics in finance,for example,in stock,we can get detailed information for stocks.Moreover,we can use the analyzed data to predict certain stocks.This can help people decide whether to buy a stock or not by providing predicted data for people at a certain convincing level,helping to protect them from potential losses.
文摘This research paper has provided the methodology and design for implementing the hybrid author recommender system using Azure Data Lake Analytics and Power BI. It offers a recommendation for the top 1000 Authors of computer science in different fields of study. The technique used in this paper is handling the inadequate Information for citation;it removes the problem of cold start, which is encountered by very many other recommender systems. In this paper, abstracts, the titles, and the Microsoft academic graphs have been used in coming up with the recommendation list for every document, which is used to combine the content-based approaches and the co-citations. Prioritization and the blending of every technique have been allowed by the tuning system parameters, allowing for the authority in results of recommendation versus the paper novelty. In the end, we do observe that there is a direct correlation between the similarity rankings that have been produced by the system and the scores of the participant. The results coming from the associated scrips of analysis and the user survey have been made available through the recommendation system. Managers must gain the required expertise to fully utilize the benefits that come with business intelligence systems [1]. Data mining has become an important tool for managers that provides insights about their daily operations and leverage the information provided by decision support systems to improve customer relationships [2]. Additionally, managers require business intelligence systems that can rank the output in the order of priority. Ranking algorithm can replace the traditional data mining algorithms that will be discussed in-depth in the literature review [3].
文摘This article delves into the intricate relationship between big data, cloud computing, and artificial intelligence, shedding light on their fundamental attributes and interdependence. It explores the seamless amalgamation of AI methodologies within cloud computing and big data analytics, encompassing the development of a cloud computing framework built on the robust foundation of the Hadoop platform, enriched by AI learning algorithms. Additionally, it examines the creation of a predictive model empowered by tailored artificial intelligence techniques. Rigorous simulations are conducted to extract valuable insights, facilitating method evaluation and performance assessment, all within the dynamic Hadoop environment, thereby reaffirming the precision of the proposed approach. The results and analysis section reveals compelling findings derived from comprehensive simulations within the Hadoop environment. These outcomes demonstrate the efficacy of the Sport AI Model (SAIM) framework in enhancing the accuracy of sports-related outcome predictions. Through meticulous mathematical analyses and performance assessments, integrating AI with big data emerges as a powerful tool for optimizing decision-making in sports. The discussion section extends the implications of these results, highlighting the potential for SAIM to revolutionize sports forecasting, strategic planning, and performance optimization for players and coaches. The combination of big data, cloud computing, and AI offers a promising avenue for future advancements in sports analytics. This research underscores the synergy between these technologies and paves the way for innovative approaches to sports-related decision-making and performance enhancement.
文摘This article discusses the current status and development strategies of computer science and technology in the context of big data.Firstly,it explains the relationship between big data and computer science and technology,focusing on analyzing the current application status of computer science and technology in big data,including data storage,data processing,and data analysis.Then,it proposes development strategies for big data processing.Computer science and technology play a vital role in big data processing by providing strong technical support.
文摘Contemporary mainstream big data governance platforms are built atop the big data ecosystem components,offering a one-stop development and analysis governance platform for the collection,transmission,storage,cleansing,transformation,querying and analysis,data development,publishing,and subscription,sharing and exchange,management,and services of massive data.These platforms serve various role members who have internal and external data needs.However,in the era of big data,the rapid update and iteration of big data technologies,the diversification of data businesses,and the exponential growth of data present more challenges and uncertainties to the construction of big data governance platforms.This paper discusses how to effectively build a data governance platform under the big data system from the perspectives of functional architecture,logical architecture,data architecture,and functional design.
基金Basic Scientific Research Project of universities in 2023:Application of Big Data Technology in Smart Agriculture of Liaoning Region in 2023(Project number:JYTMS20230966)。
文摘Big data finds extensive application and many fields.It brings new opportunities for the development of agriculture.Using big data technology to promote the development of smart agriculture can greatly improve the effect of agricultural planting,reduce the input of manpower and material resources,and lay a solid foundation for the realization of agricultural modernization.In this regard,this paper briefly analyzes the construction and application of smart agriculture based on big data technology,hoping to provide some valuable insights for readers.