The analysis of spatially correlated binary data observed on lattices is an interesting topic that catches the attention of many scholars of different scientific fields like epidemiology, medicine, agriculture, biolog...The analysis of spatially correlated binary data observed on lattices is an interesting topic that catches the attention of many scholars of different scientific fields like epidemiology, medicine, agriculture, biology, geology and geography. To overcome the encountered difficulties upon fitting the autologistic regression model to analyze such data via Bayesian and/or Markov chain Monte Carlo (MCMC) techniques, the Gaussian latent variable model has been enrolled in the methodology. Assuming a normal distribution for the latent random variable may not be realistic and wrong, normal assumptions might cause bias in parameter estimates and affect the accuracy of results and inferences. Thus, it entails more flexible prior distributions for the latent variable in the spatial models. A review of the recent literature in spatial statistics shows that there is an increasing tendency in presenting models that are involving skew distributions, especially skew-normal ones. In this study, a skew-normal latent variable modeling was developed in Bayesian analysis of the spatially correlated binary data that were acquired on uncorrelated lattices. The proposed methodology was applied in inspecting spatial dependency and related factors of tooth caries occurrences in a sample of students of Yasuj University of Medical Sciences, Yasuj, Iran. The results indicated that the skew-normal latent variable model had validity and it made a decent criterion that fitted caries data.展开更多
MapReduce is a widely used programming model for large-scale data processing.However,it still suffers from the skew problem,which refers to the case in which load is imbalanced among tasks.This problem can cause a sma...MapReduce is a widely used programming model for large-scale data processing.However,it still suffers from the skew problem,which refers to the case in which load is imbalanced among tasks.This problem can cause a small number of tasks to consume much more time than other tasks,thereby prolonging the total job completion time.Existing solutions to this problem commonly predict the loads of tasks and then rebalance the load among them.However,solutions of this kind often incur high performance overhead due to the load prediction and rebalancing.Moreover,existing solutions target the partitioning skew for reduce tasks,but cannot mitigate the computational skew for map tasks.Accordingly,in this paper,we present DynamicAdjust,a run-time dynamic resource adjustment technique for mitigating skew.Rather than rebalancing the load among tasks,DynamicAdjust monitors the run-time execution of tasks and dynamically increases resources for those tasks that require more computation.In so doing,DynamicAdjust can not only eliminate the overhead incurred by load prediction and rebalancing,but also culls both the partitioning skew and the computational skew.Experiments are conducted based on a 21-node real cluster using real-world datasets.The results show that DynamicAdjust can mitigate the negative impact of the skew and shorten the job completion time by up to 40.85%.展开更多
This paper focuses on the parallel aggregation processing of data streams based on the shared-nothing architecture. A novel granularity-aware parallel aggregating model is proposed. It employs parallel sampling and li...This paper focuses on the parallel aggregation processing of data streams based on the shared-nothing architecture. A novel granularity-aware parallel aggregating model is proposed. It employs parallel sampling and linear regression to describe the characteristics of the data quantity in the query window in order to determine the partition granularity of tuples, and utilizes equal depth histogram to implement partitio ning. This method can avoid data skew and reduce communi cation cost. The experiment results on both synthetic data and actual data prove that the proposed method is efficient, practical and suitable for time-varying data streams processing.展开更多
有限混合回归(Finite Mixture of Regression,FMR)模型的变量选择常常在统计建模中使用。目前关于FMR模型的研究主要集中在回归误差服从正态分布的情形,而这种假设不适用于研究非对称的数据。对于偏斜数据,众数的代表性优于均值。本文...有限混合回归(Finite Mixture of Regression,FMR)模型的变量选择常常在统计建模中使用。目前关于FMR模型的研究主要集中在回归误差服从正态分布的情形,而这种假设不适用于研究非对称的数据。对于偏斜数据,众数的代表性优于均值。本文基于混合偏正态数据介绍了众数回归模型的变量选择方法,并证明了变量选择方法的相合性和参数估计的Oracle性质。为了估计模型的参数,提出了一种改进的EM(Expectation-Maximum)算法,通过模拟研究和实例分析进一步说明了所提出模型和变量选择方法的有效性。展开更多
Pneumonia is an acute lung infection that has caused many fatalitiesglobally. Radiologists often employ chest X-rays to identify pneumoniasince they are presently the most effective imaging method for this purpose.Com...Pneumonia is an acute lung infection that has caused many fatalitiesglobally. Radiologists often employ chest X-rays to identify pneumoniasince they are presently the most effective imaging method for this purpose.Computer-aided diagnosis of pneumonia using deep learning techniques iswidely used due to its effectiveness and performance. In the proposed method,the Synthetic Minority Oversampling Technique (SMOTE) approach is usedto eliminate the class imbalance in the X-ray dataset. To compensate forthe paucity of accessible data, pre-trained transfer learning is used, and anensemble Convolutional Neural Network (CNN) model is developed. Theensemble model consists of all possible combinations of the MobileNetv2,Visual Geometry Group (VGG16), and DenseNet169 models. MobileNetV2and DenseNet169 performed well in the Single classifier model, with anaccuracy of 94%, while the ensemble model (MobileNetV2+DenseNet169)achieved an accuracy of 96.9%. Using the data synchronous parallel modelin Distributed Tensorflow, the training process accelerated performance by98.6% and outperformed other conventional approaches.展开更多
文摘The analysis of spatially correlated binary data observed on lattices is an interesting topic that catches the attention of many scholars of different scientific fields like epidemiology, medicine, agriculture, biology, geology and geography. To overcome the encountered difficulties upon fitting the autologistic regression model to analyze such data via Bayesian and/or Markov chain Monte Carlo (MCMC) techniques, the Gaussian latent variable model has been enrolled in the methodology. Assuming a normal distribution for the latent random variable may not be realistic and wrong, normal assumptions might cause bias in parameter estimates and affect the accuracy of results and inferences. Thus, it entails more flexible prior distributions for the latent variable in the spatial models. A review of the recent literature in spatial statistics shows that there is an increasing tendency in presenting models that are involving skew distributions, especially skew-normal ones. In this study, a skew-normal latent variable modeling was developed in Bayesian analysis of the spatially correlated binary data that were acquired on uncorrelated lattices. The proposed methodology was applied in inspecting spatial dependency and related factors of tooth caries occurrences in a sample of students of Yasuj University of Medical Sciences, Yasuj, Iran. The results indicated that the skew-normal latent variable model had validity and it made a decent criterion that fitted caries data.
基金funded by the Key Area Research and Development Program of Guangdong Province(2019B010137005)the National Natural Science Foundation of China(61906209).
文摘MapReduce is a widely used programming model for large-scale data processing.However,it still suffers from the skew problem,which refers to the case in which load is imbalanced among tasks.This problem can cause a small number of tasks to consume much more time than other tasks,thereby prolonging the total job completion time.Existing solutions to this problem commonly predict the loads of tasks and then rebalance the load among them.However,solutions of this kind often incur high performance overhead due to the load prediction and rebalancing.Moreover,existing solutions target the partitioning skew for reduce tasks,but cannot mitigate the computational skew for map tasks.Accordingly,in this paper,we present DynamicAdjust,a run-time dynamic resource adjustment technique for mitigating skew.Rather than rebalancing the load among tasks,DynamicAdjust monitors the run-time execution of tasks and dynamically increases resources for those tasks that require more computation.In so doing,DynamicAdjust can not only eliminate the overhead incurred by load prediction and rebalancing,but also culls both the partitioning skew and the computational skew.Experiments are conducted based on a 21-node real cluster using real-world datasets.The results show that DynamicAdjust can mitigate the negative impact of the skew and shorten the job completion time by up to 40.85%.
基金Supported by Foundation of High Technology Pro-ject of Jiangsu (BG2004034) , Foundation of Graduate Creative Pro-gramof Jiangsu (xm04-36)
文摘This paper focuses on the parallel aggregation processing of data streams based on the shared-nothing architecture. A novel granularity-aware parallel aggregating model is proposed. It employs parallel sampling and linear regression to describe the characteristics of the data quantity in the query window in order to determine the partition granularity of tuples, and utilizes equal depth histogram to implement partitio ning. This method can avoid data skew and reduce communi cation cost. The experiment results on both synthetic data and actual data prove that the proposed method is efficient, practical and suitable for time-varying data streams processing.
文摘有限混合回归(Finite Mixture of Regression,FMR)模型的变量选择常常在统计建模中使用。目前关于FMR模型的研究主要集中在回归误差服从正态分布的情形,而这种假设不适用于研究非对称的数据。对于偏斜数据,众数的代表性优于均值。本文基于混合偏正态数据介绍了众数回归模型的变量选择方法,并证明了变量选择方法的相合性和参数估计的Oracle性质。为了估计模型的参数,提出了一种改进的EM(Expectation-Maximum)算法,通过模拟研究和实例分析进一步说明了所提出模型和变量选择方法的有效性。
文摘Pneumonia is an acute lung infection that has caused many fatalitiesglobally. Radiologists often employ chest X-rays to identify pneumoniasince they are presently the most effective imaging method for this purpose.Computer-aided diagnosis of pneumonia using deep learning techniques iswidely used due to its effectiveness and performance. In the proposed method,the Synthetic Minority Oversampling Technique (SMOTE) approach is usedto eliminate the class imbalance in the X-ray dataset. To compensate forthe paucity of accessible data, pre-trained transfer learning is used, and anensemble Convolutional Neural Network (CNN) model is developed. Theensemble model consists of all possible combinations of the MobileNetv2,Visual Geometry Group (VGG16), and DenseNet169 models. MobileNetV2and DenseNet169 performed well in the Single classifier model, with anaccuracy of 94%, while the ensemble model (MobileNetV2+DenseNet169)achieved an accuracy of 96.9%. Using the data synchronous parallel modelin Distributed Tensorflow, the training process accelerated performance by98.6% and outperformed other conventional approaches.