With cloud computing technology becoming more mature, it is essential to combine the big data processing tool Hadoop with the Infrastructure as a Service(Iaa S) cloud platform. In this study, we first propose a new ...With cloud computing technology becoming more mature, it is essential to combine the big data processing tool Hadoop with the Infrastructure as a Service(Iaa S) cloud platform. In this study, we first propose a new Dynamic Hadoop Cluster on Iaa S(DHCI) architecture, which includes four key modules: monitoring,scheduling, Virtual Machine(VM) management, and VM migration modules. The load of both physical hosts and VMs is collected by the monitoring module and can be used to design resource scheduling and data locality solutions. Second, we present a simple load feedback-based resource scheduling scheme. The resource allocation can be avoided on overburdened physical hosts or the strong scalability of virtual cluster can be achieved by fluctuating the number of VMs. To improve the flexibility, we adopt the separated deployment of the computation and storage VMs in the DHCI architecture, which negatively impacts the data locality. Third, we reuse the method of VM migration and propose a dynamic migration-based data locality scheme using parallel computing entropy. We migrate the computation nodes to different host(s) or rack(s) where the corresponding storage nodes are deployed to satisfy the requirement of data locality. We evaluate our solutions in a realistic scenario based on Open Stack.Substantial experimental results demonstrate the effectiveness of our solutions that contribute to balance the workload and performance improvement, even under heavy-loaded cloud system conditions.展开更多
A data processing method was proposed for eliminating the end restraint in triaxial tests of soil. A digital image processing method was used to calculate the local deformations and local stresses for any region on th...A data processing method was proposed for eliminating the end restraint in triaxial tests of soil. A digital image processing method was used to calculate the local deformations and local stresses for any region on the surface of triaxial soil specimens. The principle and implementation of this digital image processing method were introduced as well as the calculation method for local mechanical properties of soil specimens. Comparisons were made between the test results calculated by the data from both the entire specimen and local regions, and it was found that the deformations were more uniform in the middle region compared with the entire specimen. In order to quantify the nonuniform characteristic of deformation, the non-uniformity coefficients of strain were defined and calculated. Traditional and end-lubricated triaxial tests were conducted under the same condition to investigate the effects of using local region data for deformation calculation on eliminating the end restraint of specimens. After the statistical analysis of all test results, it was concluded that for the tested soil specimen with the size of 39.1 mm × 80 ram, the utilization of the middle 35 mm region of traditional specimens in data processing had a better effect on eliminating end restraint compared with end lubrication. Furthermore, the local data analysis in this paper was validated through the comparisons with the test results from other researchers.展开更多
At present, big data is very popular, because it has proved to be much successful in many fields such as social media, E-commerce transactions, etc. Big data describes the tools and technologies needed to capture, man...At present, big data is very popular, because it has proved to be much successful in many fields such as social media, E-commerce transactions, etc. Big data describes the tools and technologies needed to capture, manage, store, distribute, and analyze petabyte or larger-sized datasets having different structures with high speed. Big data can be structured, unstructured, or semi structured. Hadoop is an open source framework that is used to process large amounts of data in an inexpensive and efficient way, and job scheduling is a key factor for achieving high performance in big data processing. This paper gives an overview of big data and highlights the problems and challenges in big data. It then highlights Hadoop Distributed File System (HDFS), Hadoop MapReduce, and various parameters that affect the performance of job scheduling algorithms in big data such as Job Tracker, Task Tracker, Name Node, Data Node, etc. The primary purpose of this paper is to present a comparative study of job scheduling algorithms along with their experimental results in Hadoop environment. In addition, this paper describes the advantages, disadvantages, features, and drawbacks of various Hadoop job schedulers such as FIFO, Fair, capacity, Deadline Constraints, Delay, LATE, Resource Aware, etc, and provides a comparative study among these schedulers.展开更多
A new numerical differentiation method with local opti- mum by data segmentation is proposed. The segmentation of data is based on the second derivatives computed by a Fourier devel- opment method. A filtering process...A new numerical differentiation method with local opti- mum by data segmentation is proposed. The segmentation of data is based on the second derivatives computed by a Fourier devel- opment method. A filtering process is used to achieve acceptable segmentation. Numerical results are presented by using the data segmentation method, compared with the regularization method. For further investigation, the proposed algorithm is applied to the resistance capacitance (RC) networks identification problem, and improvements of the result are obtained by using this algorithm.展开更多
On one hand,the diversity of activities and on the other hand,the conflicts between beneficiaries necessitate the efficient management and supervision of coastal areas.Accordingly,monitoring and evaluation of such are...On one hand,the diversity of activities and on the other hand,the conflicts between beneficiaries necessitate the efficient management and supervision of coastal areas.Accordingly,monitoring and evaluation of such areas can be considered as a critical factor in the national development and directorship of the sources.With regard to this fact,remote sourcing technologies with use of analytical operations of geographic information systems(GIS),will be remarkably advantageous.Iran’s south-eastern Makran coasts are geopolitically and economically,of importance due to their strategic characteristics but have been neglected and their development and transit infrastructure are significantly beyond the international standards.Therefore,in this paper,with regard to the importance of developing Makran coasts,a Multi-Criterion Decision Analysis(MCDA)method was applied to identify and prioritize the intended criteria and parameters of zoning,in order to establish new maritime zones.The major scope of this study is to employ the satellite data,remote sensing methods,and regional statistics obtained from Jask synoptic station and investigate the region’s status in terms of topography,rainfall rate and temperature changes to reach to a comprehensive monitoring and zoning of the coastal line and to provide a pervasive local data base via use of GIS and MCDA,which will be implemented to construct the coastal regions.In this article,while explaining the steps of coastal monitoring,its main objectives are also explained and the necessary procedures for doing so are presented.Then,the general steps of marine climate identification and study of marine parameters are stated and the final achievements of the coastal monitoring process are determined.In the following,considering that this article focuses on the monitoring of Makran beaches,the method of work in the mentioned region will be described and its specific differences and complexities will be discussed in detail.Also,the impact of such projects on future research results will be discussed.展开更多
Cochlodinium polykrikoides is a notoriously harmful algal species that inflicts severe damage on the aquacultures of the coastal seas of Korea and Japan. Information on their expected movement tracks and boundaries of...Cochlodinium polykrikoides is a notoriously harmful algal species that inflicts severe damage on the aquacultures of the coastal seas of Korea and Japan. Information on their expected movement tracks and boundaries of influence is very useful and important for the effective establishment of a reduction plan. In general, the information is supported by a red-tide(a.k.a algal bloom) model. The performance of the model is highly dependent on the accuracy of parameters, which are the coefficients of functions approximating the biological growth and loss patterns of the C. polykrikoides. These parameters have been estimated using the bioassay data composed of growth-limiting factor and net growth rate value pairs. In the case of the C. polykrikoides, the parameters are different from each other in accordance with the used data because the bioassay data are sufficient compared to the other algal species. The parameters estimated by one specific dataset can be viewed as locally-optimized because they are adjusted only by that dataset. In cases where the other one data set is used, the estimation error might be considerable. In this study, the parameters are estimated by all available data sets without the use of only one specific data set and thus can be considered globally optimized. The cost function for the optimization is defined as the integrated mean squared estimation error, i.e., the difference between the values of the experimental and estimated rates. Based on quantitative error analysis, the root-mean squared errors of the global parameters show smaller values, approximately 25%–50%, than the values of the local parameters. In addition, bias is removed completely in the case of the globally estimated parameters. The parameter sets can be used as the reference default values of a red-tide model because they are optimal and representative. However, additional tuning of the parameters using the in-situ monitoring data is highly required.As opposed to the bioassay data, it is necessary because the bioassay data have limitations in terms of the in-situ coastal conditions.展开更多
Over the past decade, open-source software use has grown. Today, many companies including Google, Microsoft, Meta, RedHat, MongoDB, and Apache are major participants of open-source contributions. With the increased us...Over the past decade, open-source software use has grown. Today, many companies including Google, Microsoft, Meta, RedHat, MongoDB, and Apache are major participants of open-source contributions. With the increased use of open-source software or integration of open-source software into custom-developed software, the quality of this software component increases in importance. This study examined a sample of open-source applications from GitHub. Static software analytics were conducted, and each application was classified for its risk level. In the analyzed applications, it was found that 90% of the applications were classified as low risk or moderate low risk indicating a high level of quality for open-source applications.展开更多
Since data services are penetrating into our daily life rapidly, the mobile network becomes more complicated, and the amount of data transmission is more and more increasing. In this case, the traditional statistical ...Since data services are penetrating into our daily life rapidly, the mobile network becomes more complicated, and the amount of data transmission is more and more increasing. In this case, the traditional statistical methods for anomalous cell detection cannot adapt to the evolution of networks, and data mining becomes the mainstream. In this paper, we propose a novel kernel density-based local outlier factor(KLOF) to assign a degree of being an outlier to each object. Firstly, the notion of KLOF is introduced, which captures exactly the relative degree of isolation. Then, by analyzing its properties, including the tightness of upper and lower bounds, sensitivity of density perturbation, we find that KLOF is much greater than 1 for outliers. Lastly, KLOFis applied on a real-world dataset to detect anomalous cells with abnormal key performance indicators(KPIs) to verify its reliability. The experiment shows that KLOF can find outliers efficiently. It can be a guideline for the operators to perform faster and more efficient trouble shooting.展开更多
Edge computing devices are widely deployed.An important issue that arises is in that these devices suffer from security attacks.To deal with it,we turn to the blockchain technologies.The note in the alliance chain nee...Edge computing devices are widely deployed.An important issue that arises is in that these devices suffer from security attacks.To deal with it,we turn to the blockchain technologies.The note in the alliance chain need rules to limit write permissions.Alliance chain can provide security management functions,using these functions to meet the management between the members,certification,authorization,monitoring and auditing.This article mainly analyzes some requirements realization which applies to the alliance chain,and introduces a new consensus algorithm,generalized Legendre sequence(GLS)consensus algorithm,for alliance chain.GLS algorithms inherit the recognition and verification efficiency of binary sequence ciphers in computer communication and can solve a large number of nodes verification of key distribution issues.In the alliance chain,GLS consensus algorithm can complete node address hiding,automatic task sorting,task automatic grouping,task node scope confirmation,task address binding and stamp timestamp.Moreover,the GLS consensus algorithm increases the difficulty of network malicious attack.展开更多
By using 11 global ocean tide models and tidal gauge data obtained in the East China Sea and South China Sea, the influence of the ocean loading on gravity field in China and its neighbor area is calculated in this pa...By using 11 global ocean tide models and tidal gauge data obtained in the East China Sea and South China Sea, the influence of the ocean loading on gravity field in China and its neighbor area is calculated in this paper. Furthermore, the differences between the results from original global models and modified models with local tides are discussed based on above calculation. The comparison shows that the differences at the position near the sea are so large that the local tides must be taken into account in the calculation. When the global ocean tide models of CSR4.0, FES02, GOT00, NAO99 and ORI96 are chosen, the local effect for M2 is less than 0.10 × 10-8 m·s-2 over the area far away from sea. And the local effect for O1 is less than 0.05 × 10-8 m·s-2 over that area when choosing AG95 or CSR3.0 models. This numerical result demonstrates that the choice of model is a complex problem because of the inconsistent accuracy of the models over the areas of East and South China Seas.展开更多
This paper applies software analytics to open source code. Open-source software gives both individuals and businesses the flexibility to work with different parts of available code to modify it or incorporate it into ...This paper applies software analytics to open source code. Open-source software gives both individuals and businesses the flexibility to work with different parts of available code to modify it or incorporate it into their own project. The open source software market is growing. Major companies such as AWS, Facebook, Google, IBM, Microsoft, Netflix, SAP, Cisco, Intel, and Tesla have joined the open source software community. In this study, a sample of 40 open source applications was selected. Traditional McCabe software metrics including cyclomatic and essential complexities were examined. An analytical comparison of this set of metrics and derived metrics for high risk software was utilized as a basis for addressing risk management in the adoption and integration decisions of open source software. From this comparison, refinements were added, and contemporary concepts of design and data metrics derived from cyclomatic complexity were integrated into a classification scheme for software quality. It was found that 84% of the sample open source applications were classified as moderate low risk or low risk indicating that open source software exhibits low risk characteristics. The 40 open source applications were the base data for the model resulting in a technique which is applicable to any open source code regardless of functionality, language, or size.展开更多
MapReduce is currently the most popular programming model for big data processing, and Hadoop is a weU-known MapReduce implementation platform. However, Hadoop jobs suffer from imbalanced workloads during the reduce p...MapReduce is currently the most popular programming model for big data processing, and Hadoop is a weU-known MapReduce implementation platform. However, Hadoop jobs suffer from imbalanced workloads during the reduce phase and inefficiently utilize the available computing and network resources. In some cases, these problems lead to serious performance degradation in MapReduce jobs. To resolve these problems, in this paper, we propose two algorithms, the Locality-Based Balanced Schedule (LBBS) and Overlapping-Based Resource Utilization (OBRU), that optimize the Locality-Enhanced Load Balance (LELB) and the Map, Local reduce, Shuffle, and final Reduce (MLSR) phases. The LBBS collects partition information from input data during the map phase and generates balanced schedule plans for the reduce phase. OBRU is responsible for using computing and network resources efficiently by overlapping the local reduce, shuffle, and final reduce phases. Experimental results show that the LBBS and OBRU algorithms yield significant improvements in load balancing. When LBBS and OBRU are applied, job performance increases by 15% from that of models using LELB and MLSR.展开更多
The regulations of cross-border data flows is a growing challenge for the international community.International trade agreements,however,appear to be pioneering legal methods to cope,as they have grappled with this is...The regulations of cross-border data flows is a growing challenge for the international community.International trade agreements,however,appear to be pioneering legal methods to cope,as they have grappled with this issue since the 1990s.The World Trade Organization(WTO)rules system offers a partial solution under the General Agreement on Trade in Services(GATS),which covers aspects related to cross-border data flows.The Comprehensive and Progressive Agreement for Trans-Pacific Partnership(CPTPP)and the United States-Mexico-Canada Agreement(USMCA)have also been perceived to provide forward-looking resolutions.In this context,this article analyzes why a resolution to this issue may be illusory.While they regulate cross-border data flows in various ways,the structure and wording of exception articles of both the CPTPP and USMCA have the potential to pose significant challenges to the international legal system.The new system,attempting to weigh societal values and economic development,is imbalanced,often valuing free trade more than individual online privacy and cybersecurity.Furthermore,the inclusion of poison-pill clauses is,by nature,antithetical to cooperation.Thus,for the international community generally,and China in particular,cross-border data flows would best be regulated under the WTO-centered multilateral trade law system.展开更多
We propose a new functional single index model, which called dynamic single-index model for functional data, or DSIM, to efficiently perform non-linear and dynamic relationships between functional predictor and functi...We propose a new functional single index model, which called dynamic single-index model for functional data, or DSIM, to efficiently perform non-linear and dynamic relationships between functional predictor and functional response. The proposed model naturally allows for some curvature not captured by the ordinary functional linear model. By using the proposed two-step estimating algorithm, we develop the estimates for both the link function and the regression coefficient function, and then provide predictions of new response trajectories. Besides the asymptotic properties for the estimates of the unknown functions, we also establish the consistency of the predictions of new response trajectories under mild conditions. Finally, we show through extensive simulation studies and a real data example that the proposed DSIM can highly outperform existed functional regression methods in most settings.展开更多
Informative dropout often arise in longitudinal data. In this paper we propose a mixture model in which the responses follow a semiparametric varying coefficient random effects model and some of the regression coeffic...Informative dropout often arise in longitudinal data. In this paper we propose a mixture model in which the responses follow a semiparametric varying coefficient random effects model and some of the regression coefficients depend on the dropout time in a non-parametric way. The local linear version of the profile-kernel method is used to estimate the parameters of the model. The proposed estimators are shown to be consistent and asymptotically normal, and the finite performance of the estimators is evaluated by numerical simulation.展开更多
In unstructured finite volume method,loop on different mesh components such as cells,faces,nodes,etc is used widely for the traversal of data.Mesh loop results in direct or indirect data access that affects data local...In unstructured finite volume method,loop on different mesh components such as cells,faces,nodes,etc is used widely for the traversal of data.Mesh loop results in direct or indirect data access that affects data locality significantly.By loop on mesh,many threads accessing the same data lead to data dependence.Both data locality and data dependence play an important part in the performance of GPU simulations.For optimizing a GPU-accelerated unstructured finite volume Computational Fluid Dynamics(CFD)program,the performance of hot spots under different loops on cells,faces,and nodes is evaluated on Nvidia Tesla V100 and K80.Numerical tests under different mesh scales show that the effects of mesh loop modes are different on data locality and data dependence.Specifically,face loop makes the best data locality,so long as access to face data exists in kernels.Cell loop brings the smallest overheads due to non-coalescing data access,when both cell and node data are used in computing without face data.Cell loop owns the best performance in the condition that only indirect access of cell data exists in kernels.Atomic operations reduced the performance of kernels largely in K80,which is not obvious on V100.With the suitable mesh loop mode in all kernels,the overall performance of GPU simulations can be increased by 15%-20%.Finally,the program on a single GPU V100 can achieve maximum 21.7 and average 14.1 speed up compared with 28 MPI tasks on two Intel CPUs Xeon Gold 6132.展开更多
Local extreme rain usually resulted in disasters such as flash floods and landslides. Upon today, it is still one of the most difficult tasks for operational weather forecast centers to predict those events accurately...Local extreme rain usually resulted in disasters such as flash floods and landslides. Upon today, it is still one of the most difficult tasks for operational weather forecast centers to predict those events accurately. In this paper, we simulate an extreme precipitation event with ensemble Kalman filter(En KF) assimilation of Doppler radial-velocity observations, and analyze the uncertainties of the assimilation. The results demonstrate that, without assimilation radar data, neither a single initialization of deterministic forecast nor an ensemble forecast with adding perturbations or multiple physical parameterizations can predict the location of strong precipitation. However, forecast was significantly improved with assimilation of radar data, especially the location of the precipitation. The direct cause of the improvement is the buildup of a deep mesoscale convection system with En KF assimilation of radar data. Under a large scale background favorable for mesoscale convection, efficient perturbations of upstream mid-low level meridional wind and moisture are key factors for the assimilation and forecast. Uncertainty still exists for the forecast of this case due to its limited predictability. Both the difference of large scale initial fields and the difference of analysis obtained from En KF assimilation due to small amplitude of initial perturbations could have critical influences to the event's prediction. Forecast could be improved through more cycles of En KF assimilation. Sensitivity tests also support that more accurate forecasts are expected through improving numerical models and observations.展开更多
基金supported by the Open Project Program of Jiangsu High Technology Research Key Laboratory for Wireless Sensor Networks(No.WSNLBKF201503)the Fundamental Research Funds for the Central Universities(No.2016JBM011)+2 种基金Fundamental Research Funds for the Central Universities(No.2014ZD03-03)the Priority Academic Program Development of Jiangsu Higher Education InstitutionsJiangsu Collaborative Innovation Center on Atmospheric Environment and Equipment Technology
文摘With cloud computing technology becoming more mature, it is essential to combine the big data processing tool Hadoop with the Infrastructure as a Service(Iaa S) cloud platform. In this study, we first propose a new Dynamic Hadoop Cluster on Iaa S(DHCI) architecture, which includes four key modules: monitoring,scheduling, Virtual Machine(VM) management, and VM migration modules. The load of both physical hosts and VMs is collected by the monitoring module and can be used to design resource scheduling and data locality solutions. Second, we present a simple load feedback-based resource scheduling scheme. The resource allocation can be avoided on overburdened physical hosts or the strong scalability of virtual cluster can be achieved by fluctuating the number of VMs. To improve the flexibility, we adopt the separated deployment of the computation and storage VMs in the DHCI architecture, which negatively impacts the data locality. Third, we reuse the method of VM migration and propose a dynamic migration-based data locality scheme using parallel computing entropy. We migrate the computation nodes to different host(s) or rack(s) where the corresponding storage nodes are deployed to satisfy the requirement of data locality. We evaluate our solutions in a realistic scenario based on Open Stack.Substantial experimental results demonstrate the effectiveness of our solutions that contribute to balance the workload and performance improvement, even under heavy-loaded cloud system conditions.
基金Supported by Major State Basic Research Development Program of China("973" Program,No.2010CB731502)
文摘A data processing method was proposed for eliminating the end restraint in triaxial tests of soil. A digital image processing method was used to calculate the local deformations and local stresses for any region on the surface of triaxial soil specimens. The principle and implementation of this digital image processing method were introduced as well as the calculation method for local mechanical properties of soil specimens. Comparisons were made between the test results calculated by the data from both the entire specimen and local regions, and it was found that the deformations were more uniform in the middle region compared with the entire specimen. In order to quantify the nonuniform characteristic of deformation, the non-uniformity coefficients of strain were defined and calculated. Traditional and end-lubricated triaxial tests were conducted under the same condition to investigate the effects of using local region data for deformation calculation on eliminating the end restraint of specimens. After the statistical analysis of all test results, it was concluded that for the tested soil specimen with the size of 39.1 mm × 80 ram, the utilization of the middle 35 mm region of traditional specimens in data processing had a better effect on eliminating end restraint compared with end lubrication. Furthermore, the local data analysis in this paper was validated through the comparisons with the test results from other researchers.
文摘At present, big data is very popular, because it has proved to be much successful in many fields such as social media, E-commerce transactions, etc. Big data describes the tools and technologies needed to capture, manage, store, distribute, and analyze petabyte or larger-sized datasets having different structures with high speed. Big data can be structured, unstructured, or semi structured. Hadoop is an open source framework that is used to process large amounts of data in an inexpensive and efficient way, and job scheduling is a key factor for achieving high performance in big data processing. This paper gives an overview of big data and highlights the problems and challenges in big data. It then highlights Hadoop Distributed File System (HDFS), Hadoop MapReduce, and various parameters that affect the performance of job scheduling algorithms in big data such as Job Tracker, Task Tracker, Name Node, Data Node, etc. The primary purpose of this paper is to present a comparative study of job scheduling algorithms along with their experimental results in Hadoop environment. In addition, this paper describes the advantages, disadvantages, features, and drawbacks of various Hadoop job schedulers such as FIFO, Fair, capacity, Deadline Constraints, Delay, LATE, Resource Aware, etc, and provides a comparative study among these schedulers.
基金supported by the National Basic Research Program of China(2011CB013103)
文摘A new numerical differentiation method with local opti- mum by data segmentation is proposed. The segmentation of data is based on the second derivatives computed by a Fourier devel- opment method. A filtering process is used to achieve acceptable segmentation. Numerical results are presented by using the data segmentation method, compared with the regularization method. For further investigation, the proposed algorithm is applied to the resistance capacitance (RC) networks identification problem, and improvements of the result are obtained by using this algorithm.
文摘On one hand,the diversity of activities and on the other hand,the conflicts between beneficiaries necessitate the efficient management and supervision of coastal areas.Accordingly,monitoring and evaluation of such areas can be considered as a critical factor in the national development and directorship of the sources.With regard to this fact,remote sourcing technologies with use of analytical operations of geographic information systems(GIS),will be remarkably advantageous.Iran’s south-eastern Makran coasts are geopolitically and economically,of importance due to their strategic characteristics but have been neglected and their development and transit infrastructure are significantly beyond the international standards.Therefore,in this paper,with regard to the importance of developing Makran coasts,a Multi-Criterion Decision Analysis(MCDA)method was applied to identify and prioritize the intended criteria and parameters of zoning,in order to establish new maritime zones.The major scope of this study is to employ the satellite data,remote sensing methods,and regional statistics obtained from Jask synoptic station and investigate the region’s status in terms of topography,rainfall rate and temperature changes to reach to a comprehensive monitoring and zoning of the coastal line and to provide a pervasive local data base via use of GIS and MCDA,which will be implemented to construct the coastal regions.In this article,while explaining the steps of coastal monitoring,its main objectives are also explained and the necessary procedures for doing so are presented.Then,the general steps of marine climate identification and study of marine parameters are stated and the final achievements of the coastal monitoring process are determined.In the following,considering that this article focuses on the monitoring of Makran beaches,the method of work in the mentioned region will be described and its specific differences and complexities will be discussed in detail.Also,the impact of such projects on future research results will be discussed.
基金The part of the project "Development of Korea Operational Oceanographic System(KOOS),Phase 2",funded by the Ministry of Oceans and Fisheries,Koreathe part of the project entitled "Cooperative Project on Korea-China Bilateral Committee on Ocean Science",funded by the Ministry of Oceans and Fisheries,Korea and China-Korea Joint Research Ocean Research Center
文摘Cochlodinium polykrikoides is a notoriously harmful algal species that inflicts severe damage on the aquacultures of the coastal seas of Korea and Japan. Information on their expected movement tracks and boundaries of influence is very useful and important for the effective establishment of a reduction plan. In general, the information is supported by a red-tide(a.k.a algal bloom) model. The performance of the model is highly dependent on the accuracy of parameters, which are the coefficients of functions approximating the biological growth and loss patterns of the C. polykrikoides. These parameters have been estimated using the bioassay data composed of growth-limiting factor and net growth rate value pairs. In the case of the C. polykrikoides, the parameters are different from each other in accordance with the used data because the bioassay data are sufficient compared to the other algal species. The parameters estimated by one specific dataset can be viewed as locally-optimized because they are adjusted only by that dataset. In cases where the other one data set is used, the estimation error might be considerable. In this study, the parameters are estimated by all available data sets without the use of only one specific data set and thus can be considered globally optimized. The cost function for the optimization is defined as the integrated mean squared estimation error, i.e., the difference between the values of the experimental and estimated rates. Based on quantitative error analysis, the root-mean squared errors of the global parameters show smaller values, approximately 25%–50%, than the values of the local parameters. In addition, bias is removed completely in the case of the globally estimated parameters. The parameter sets can be used as the reference default values of a red-tide model because they are optimal and representative. However, additional tuning of the parameters using the in-situ monitoring data is highly required.As opposed to the bioassay data, it is necessary because the bioassay data have limitations in terms of the in-situ coastal conditions.
文摘Over the past decade, open-source software use has grown. Today, many companies including Google, Microsoft, Meta, RedHat, MongoDB, and Apache are major participants of open-source contributions. With the increased use of open-source software or integration of open-source software into custom-developed software, the quality of this software component increases in importance. This study examined a sample of open-source applications from GitHub. Static software analytics were conducted, and each application was classified for its risk level. In the analyzed applications, it was found that 90% of the applications were classified as low risk or moderate low risk indicating a high level of quality for open-source applications.
基金supported by the National Basic Research Program of China (973 Program: 2013CB329004)
文摘Since data services are penetrating into our daily life rapidly, the mobile network becomes more complicated, and the amount of data transmission is more and more increasing. In this case, the traditional statistical methods for anomalous cell detection cannot adapt to the evolution of networks, and data mining becomes the mainstream. In this paper, we propose a novel kernel density-based local outlier factor(KLOF) to assign a degree of being an outlier to each object. Firstly, the notion of KLOF is introduced, which captures exactly the relative degree of isolation. Then, by analyzing its properties, including the tightness of upper and lower bounds, sensitivity of density perturbation, we find that KLOF is much greater than 1 for outliers. Lastly, KLOFis applied on a real-world dataset to detect anomalous cells with abnormal key performance indicators(KPIs) to verify its reliability. The experiment shows that KLOF can find outliers efficiently. It can be a guideline for the operators to perform faster and more efficient trouble shooting.
基金We gratefully acknowledge anonymous reviewers who read drafts and made many helpful suggestions.This work is supported by the National Key Research and Development Program No.2018YFC0807002.
文摘Edge computing devices are widely deployed.An important issue that arises is in that these devices suffer from security attacks.To deal with it,we turn to the blockchain technologies.The note in the alliance chain need rules to limit write permissions.Alliance chain can provide security management functions,using these functions to meet the management between the members,certification,authorization,monitoring and auditing.This article mainly analyzes some requirements realization which applies to the alliance chain,and introduces a new consensus algorithm,generalized Legendre sequence(GLS)consensus algorithm,for alliance chain.GLS algorithms inherit the recognition and verification efficiency of binary sequence ciphers in computer communication and can solve a large number of nodes verification of key distribution issues.In the alliance chain,GLS consensus algorithm can complete node address hiding,automatic task sorting,task automatic grouping,task node scope confirmation,task address binding and stamp timestamp.Moreover,the GLS consensus algorithm increases the difficulty of network malicious attack.
基金The Key Knowledge Innovation Project (KZCX3-SW-131), the Hundred Talents Program of Chinese Academy of Sciences and the National Natural Science Foundation of China (40374029)
文摘By using 11 global ocean tide models and tidal gauge data obtained in the East China Sea and South China Sea, the influence of the ocean loading on gravity field in China and its neighbor area is calculated in this paper. Furthermore, the differences between the results from original global models and modified models with local tides are discussed based on above calculation. The comparison shows that the differences at the position near the sea are so large that the local tides must be taken into account in the calculation. When the global ocean tide models of CSR4.0, FES02, GOT00, NAO99 and ORI96 are chosen, the local effect for M2 is less than 0.10 × 10-8 m·s-2 over the area far away from sea. And the local effect for O1 is less than 0.05 × 10-8 m·s-2 over that area when choosing AG95 or CSR3.0 models. This numerical result demonstrates that the choice of model is a complex problem because of the inconsistent accuracy of the models over the areas of East and South China Seas.
文摘This paper applies software analytics to open source code. Open-source software gives both individuals and businesses the flexibility to work with different parts of available code to modify it or incorporate it into their own project. The open source software market is growing. Major companies such as AWS, Facebook, Google, IBM, Microsoft, Netflix, SAP, Cisco, Intel, and Tesla have joined the open source software community. In this study, a sample of 40 open source applications was selected. Traditional McCabe software metrics including cyclomatic and essential complexities were examined. An analytical comparison of this set of metrics and derived metrics for high risk software was utilized as a basis for addressing risk management in the adoption and integration decisions of open source software. From this comparison, refinements were added, and contemporary concepts of design and data metrics derived from cyclomatic complexity were integrated into a classification scheme for software quality. It was found that 84% of the sample open source applications were classified as moderate low risk or low risk indicating that open source software exhibits low risk characteristics. The 40 open source applications were the base data for the model resulting in a technique which is applicable to any open source code regardless of functionality, language, or size.
基金supported by the National Key R&D Program of China(Nos.2017YFB0202104 and 2017YFB0202003)
文摘MapReduce is currently the most popular programming model for big data processing, and Hadoop is a weU-known MapReduce implementation platform. However, Hadoop jobs suffer from imbalanced workloads during the reduce phase and inefficiently utilize the available computing and network resources. In some cases, these problems lead to serious performance degradation in MapReduce jobs. To resolve these problems, in this paper, we propose two algorithms, the Locality-Based Balanced Schedule (LBBS) and Overlapping-Based Resource Utilization (OBRU), that optimize the Locality-Enhanced Load Balance (LELB) and the Map, Local reduce, Shuffle, and final Reduce (MLSR) phases. The LBBS collects partition information from input data during the map phase and generates balanced schedule plans for the reduce phase. OBRU is responsible for using computing and network resources efficiently by overlapping the local reduce, shuffle, and final reduce phases. Experimental results show that the LBBS and OBRU algorithms yield significant improvements in load balancing. When LBBS and OBRU are applied, job performance increases by 15% from that of models using LELB and MLSR.
基金This article is supported by the National Social Science Fund Project"China's Non-Market Economy Status in WTO Trade Remedies"(Project No.15XFX023)the Human Rights Institute of Southwest University of Political Science and Law(SWUPL HRI)2015 Yearly Research Project"Global Human Rights Governance under the TPP."All mistakes and omissions are my responsibility.
文摘The regulations of cross-border data flows is a growing challenge for the international community.International trade agreements,however,appear to be pioneering legal methods to cope,as they have grappled with this issue since the 1990s.The World Trade Organization(WTO)rules system offers a partial solution under the General Agreement on Trade in Services(GATS),which covers aspects related to cross-border data flows.The Comprehensive and Progressive Agreement for Trans-Pacific Partnership(CPTPP)and the United States-Mexico-Canada Agreement(USMCA)have also been perceived to provide forward-looking resolutions.In this context,this article analyzes why a resolution to this issue may be illusory.While they regulate cross-border data flows in various ways,the structure and wording of exception articles of both the CPTPP and USMCA have the potential to pose significant challenges to the international legal system.The new system,attempting to weigh societal values and economic development,is imbalanced,often valuing free trade more than individual online privacy and cybersecurity.Furthermore,the inclusion of poison-pill clauses is,by nature,antithetical to cooperation.Thus,for the international community generally,and China in particular,cross-border data flows would best be regulated under the WTO-centered multilateral trade law system.
基金supported by National Natural Science Foundation of China (Grant No. 11271080)
文摘We propose a new functional single index model, which called dynamic single-index model for functional data, or DSIM, to efficiently perform non-linear and dynamic relationships between functional predictor and functional response. The proposed model naturally allows for some curvature not captured by the ordinary functional linear model. By using the proposed two-step estimating algorithm, we develop the estimates for both the link function and the regression coefficient function, and then provide predictions of new response trajectories. Besides the asymptotic properties for the estimates of the unknown functions, we also establish the consistency of the predictions of new response trajectories under mild conditions. Finally, we show through extensive simulation studies and a real data example that the proposed DSIM can highly outperform existed functional regression methods in most settings.
基金Supported by the National Natural Science Foundation of China(No.10571008)
文摘Informative dropout often arise in longitudinal data. In this paper we propose a mixture model in which the responses follow a semiparametric varying coefficient random effects model and some of the regression coefficients depend on the dropout time in a non-parametric way. The local linear version of the profile-kernel method is used to estimate the parameters of the model. The proposed estimators are shown to be consistent and asymptotically normal, and the finite performance of the estimators is evaluated by numerical simulation.
基金supported by National Numerical Wind tunnel project NNW2019ZT6-B18 and Guangdong Introducing Innovative&Entrepreneurial Teams under Grant No.2016ZT06D211.
文摘In unstructured finite volume method,loop on different mesh components such as cells,faces,nodes,etc is used widely for the traversal of data.Mesh loop results in direct or indirect data access that affects data locality significantly.By loop on mesh,many threads accessing the same data lead to data dependence.Both data locality and data dependence play an important part in the performance of GPU simulations.For optimizing a GPU-accelerated unstructured finite volume Computational Fluid Dynamics(CFD)program,the performance of hot spots under different loops on cells,faces,and nodes is evaluated on Nvidia Tesla V100 and K80.Numerical tests under different mesh scales show that the effects of mesh loop modes are different on data locality and data dependence.Specifically,face loop makes the best data locality,so long as access to face data exists in kernels.Cell loop brings the smallest overheads due to non-coalescing data access,when both cell and node data are used in computing without face data.Cell loop owns the best performance in the condition that only indirect access of cell data exists in kernels.Atomic operations reduced the performance of kernels largely in K80,which is not obvious on V100.With the suitable mesh loop mode in all kernels,the overall performance of GPU simulations can be increased by 15%-20%.Finally,the program on a single GPU V100 can achieve maximum 21.7 and average 14.1 speed up compared with 28 MPI tasks on two Intel CPUs Xeon Gold 6132.
文摘Local extreme rain usually resulted in disasters such as flash floods and landslides. Upon today, it is still one of the most difficult tasks for operational weather forecast centers to predict those events accurately. In this paper, we simulate an extreme precipitation event with ensemble Kalman filter(En KF) assimilation of Doppler radial-velocity observations, and analyze the uncertainties of the assimilation. The results demonstrate that, without assimilation radar data, neither a single initialization of deterministic forecast nor an ensemble forecast with adding perturbations or multiple physical parameterizations can predict the location of strong precipitation. However, forecast was significantly improved with assimilation of radar data, especially the location of the precipitation. The direct cause of the improvement is the buildup of a deep mesoscale convection system with En KF assimilation of radar data. Under a large scale background favorable for mesoscale convection, efficient perturbations of upstream mid-low level meridional wind and moisture are key factors for the assimilation and forecast. Uncertainty still exists for the forecast of this case due to its limited predictability. Both the difference of large scale initial fields and the difference of analysis obtained from En KF assimilation due to small amplitude of initial perturbations could have critical influences to the event's prediction. Forecast could be improved through more cycles of En KF assimilation. Sensitivity tests also support that more accurate forecasts are expected through improving numerical models and observations.