MODIS (Moderate Resolution Imaging Spectroradiometer) is a key instrument aboard the Terra (EOS AM) and Aqua (EOS PM) satellites. Linear spectral mixture models are applied to MOIDS data for the sub-pixel classi...MODIS (Moderate Resolution Imaging Spectroradiometer) is a key instrument aboard the Terra (EOS AM) and Aqua (EOS PM) satellites. Linear spectral mixture models are applied to MOIDS data for the sub-pixel classification of land covers. Shaoxing county of Zhejiang Province in China was chosen to be the study site and early rice was selected as the study crop. The derived proportions of land covers from MODIS pixel using linear spectral mixture models were compared with unsupervised classification derived from TM data acquired on the same day, which implies that MODIS data could be used as satellite data source for rice cultivation area estimation, possibly rice growth monitoring and yield forecasting on the regional scale.展开更多
Classical survival analysis assumes all subjects will experience the event of interest, but in some cases, a portion of the population may never encounter the event. These survival methods further assume independent s...Classical survival analysis assumes all subjects will experience the event of interest, but in some cases, a portion of the population may never encounter the event. These survival methods further assume independent survival times, which is not valid for honey bees, which live in nests. The study introduces a semi-parametric marginal proportional hazards mixture cure (PHMC) model with exchangeable correlation structure, using generalized estimating equations for survival data analysis. The model was tested on clustered right-censored bees survival data with a cured fraction, where two bee species were subjected to different entomopathogens to test the effect of the entomopathogens on the survival of the bee species. The Expectation-Solution algorithm is used to estimate the parameters. The study notes a weak positive association between cure statuses (ρ1=0.0007) and survival times for uncured bees (ρ2=0.0890), emphasizing their importance. The odds of being uncured for A. mellifera is higher than the odds for species M. ferruginea. The bee species, A. mellifera are more susceptible to entomopathogens icipe 7, icipe 20, and icipe 69. The Cox-Snell residuals show that the proposed semiparametric PH model generally fits the data well as compared to model that assume independent correlation structure. Thus, the semi parametric marginal proportional hazards mixture cure is parsimonious model for correlated bees survival data.展开更多
Four empirical models are tested for fitting the T-y-x equilibrium data of ethanol-water mixture by minimizing the Root Mean Square (RMS) between equilibrium data and theoretical points. The total pressure of the co...Four empirical models are tested for fitting the T-y-x equilibrium data of ethanol-water mixture by minimizing the Root Mean Square (RMS) between equilibrium data and theoretical points. The total pressure of the correspondent data is 101.3 kPa. All models parameters are also identified. The study suggests that NRTL model fits the equilibrium data best with RMS = 0.4 %.展开更多
Based on Gaussian mixture models(GMM), speed, flow and occupancy are used together in the cluster analysis of traffic flow data. Compared with other clustering and sorting techniques, as a structural model, the GMM ...Based on Gaussian mixture models(GMM), speed, flow and occupancy are used together in the cluster analysis of traffic flow data. Compared with other clustering and sorting techniques, as a structural model, the GMM is suitable for various kinds of traffic flow parameters. Gap statistics and domain knowledge of traffic flow are used to determine a proper number of clusters. The expectation-maximization (E-M) algorithm is used to estimate parameters of the GMM model. The clustered traffic flow pattems are then analyzed statistically and utilized for designing maximum likelihood classifiers for grouping real-time traffic flow data when new observations become available. Clustering analysis and pattern recognition can also be used to cluster and classify dynamic traffic flow patterns for freeway on-ramp and off-ramp weaving sections as well as for other facilities or things involving the concept of level of service, such as airports, parking lots, intersections, interrupted-flow pedestrian facilities, etc.展开更多
Since the joint probabilistic data association(JPDA)algorithm results in calculation explosion with the increasing number of targets,a multi-target tracking algorithm based on Gaussian mixture model(GMM)clustering is ...Since the joint probabilistic data association(JPDA)algorithm results in calculation explosion with the increasing number of targets,a multi-target tracking algorithm based on Gaussian mixture model(GMM)clustering is proposed.The algorithm is used to cluster the measurements,and the association matrix between measurements and tracks is constructed by the posterior probability.Compared with the traditional data association algorithm,this algorithm has better tracking performance and less computational complexity.Simulation results demonstrate the effectiveness of the proposed algorithm.展开更多
The state of charge(SOC)estimation of lithium-ion battery is an important function in the battery management system(BMS)of electric vehicles.The long short term memory(LSTM)model can be employed for SOC estimation,whi...The state of charge(SOC)estimation of lithium-ion battery is an important function in the battery management system(BMS)of electric vehicles.The long short term memory(LSTM)model can be employed for SOC estimation,which is capable of estimating the future changing states of a nonlinear system.Since the BMS usually works under complicated operating conditions,i.e the real measurement data used for model training may be corrupted by non-Gaussian noise,and thus the performance of the original LSTM with the mean square error(MSE)loss may deteriorate.Therefore,a novel LSTM with mixture kernel mean p-power error(MKMPE)loss,called MKMPE-LSTM,is developed by using the MKMPE loss to replace the MSE as the learning criterion in LSTM framework,which can achieve robust SOC estimation under the measurement data contaminated with non-Gaussian noises(or outliers)because of the MKMPE containing the p-order moments of the error distribution.In addition,a meta-heuristic algorithm,called heap-based-optimizer(HBO),is employed to optimize the hyper-parameters(mainly including learning rate,number of hidden layer neuron and value of p in MKMPE)of the proposed MKMPE-LSTM model to further improve its flexibility and generalization performance,and a novel hybrid model(HBO-MKMPE-LSTM)is established for SOC estimation under non-Gaussian noise cases.Finally,several tests are performed under various cases through a benchmark to evaluate the performance of the proposed HBO-MKMPE-LSTM model,and the results demonstrate that the proposed hybrid method can provide a good robustness and accuracy under different non-Gaussian measurement noises,and the SOC estimation results in terms of mean square error(MSE),root MSE(RMSE),mean absolute relative error(MARE),and determination coefficient R2are less than 0.05%,3%,3%,and above 99.8%at 25℃,respectively.展开更多
It is difficult to measure the sizes of illegal drug user populations directly by using the survey method because of many “hidden drug addicts” and the difficulty of receiving a true response. Systematic and routine...It is difficult to measure the sizes of illegal drug user populations directly by using the survey method because of many “hidden drug addicts” and the difficulty of receiving a true response. Systematic and routine information on treatment episodes of drug users is adopted to estimate the population size in this study. Mixture models of zero-truncated Poisson distributions using the nonparametric maximum likelihood estimators (NPMLE) by means of capture-recapture repeated count data were used to project the number of drug users. The method was applied to surveillance data of drug users identified by treatment episodes in over 1140 health treatment centers in Thailand from the Bureau of Health Service System Development, Ministry of Public Health. We presented how this mixture model could be utilized to construct the unobserved frequency of drug users with no treatment episode and further estimated the total population size of drug users in the country from 2005 to 2007. The result of simulation was confirmed that mixture model is suitable when population is large. By means of mixture models, the estimations for the number of drug users were fitted with excellent goodness-of-fit values and we were also compared to the conventional Chao estimates. The NPMLE for the total number of drug users in Thailand 2005, 2006, and 2007 were 184,045 (95% CI: 181,297-86,793), 230,665 (95% CI: 226,611-234,719), 299,670 (95% CI: 294,217-305,123), respectively, also 125,265 (95% CI: 123,092-127,142), 166,287 (95% CI: 163,222-169,352), 228,898 (95% CI: 224,766 - 233,030) for the number of methamphetamine (Yaba) users, and 11,559 (95% CI: 10,234-12,884), 11,333 (95% CI: 9276-13,390), 8953 (95% CI: 7878-10,028) for the number of heroin users, respectively. The numbers of marijuana, kratom-plant, opium, and inhalant users were underestimated because their symptoms were mild and not severe enough to remedy in health treatment centers which led to the smaller size of the total number of drug users. The well-estimated sizes of heroin and methamphetamine addicts are high reliable because they are based on clearly evident count with a severe addiction problem to health treatment centers. The estimation by means of mixture models can be recommended to monitor drug demand trend and drug health service routinely;it is easy to calculate via the available programs MIXTP based on request.展开更多
Cyber losses in terms of number of records breached under cyber incidents commonly feature a significant portion of zeros, specific characteristics of mid-range losses and large losses, which make it hard to model the...Cyber losses in terms of number of records breached under cyber incidents commonly feature a significant portion of zeros, specific characteristics of mid-range losses and large losses, which make it hard to model the whole range of the losses using a standard loss distribution. We tackle this modeling problem by proposing a three-component spliced regression model that can simultaneously model zeros, moderate and large losses and consider heterogeneous effects in mixture components. To apply our proposed model to Privacy Right Clearinghouse (PRC) data breach chronology, we segment geographical groups using unsupervised cluster analysis, and utilize a covariate-dependent probability to model zero losses, finite mixture distributions for moderate body and an extreme value distribution for large losses capturing the heavy-tailed nature of the loss data. Parameters and coefficients are estimated using the Expectation-Maximization (EM) algorithm. Combining with our frequency model (generalized linear mixed model) for data breaches, aggregate loss distributions are investigated and applications on cyber insurance pricing and risk management are discussed.展开更多
In this paper, we provide a method based on quantiles to estimate the parameters of a finite mixture of Fréchet distributions, for a large sample of strongly dependent data. This is a situation that appears when ...In this paper, we provide a method based on quantiles to estimate the parameters of a finite mixture of Fréchet distributions, for a large sample of strongly dependent data. This is a situation that appears when dealing with environmental data and there was a real need of such method. We validate our approach by means of estimation and goodness-of-fit testing over simulated data, showing an accurate performance.展开更多
For the fault detection and diagnosis problem in largescale industrial systems, there are two important issues: the missing data samples and the non-Gaussian property of the data. However, most of the existing data-d...For the fault detection and diagnosis problem in largescale industrial systems, there are two important issues: the missing data samples and the non-Gaussian property of the data. However, most of the existing data-driven methods cannot be able to handle both of them. Thus, a new Bayesian network classifier based fault detection and diagnosis method is proposed. At first, a non-imputation method is presented to handle the data incomplete samples, with the property of the proposed Bayesian network classifier, and the missing values can be marginalized in an elegant manner. Furthermore, the Gaussian mixture model is used to approximate the non-Gaussian data with a linear combination of finite Gaussian mixtures, so that the Bayesian network can process the non-Gaussian data in an effective way. Therefore, the entire fault detection and diagnosis method can deal with the high-dimensional incomplete process samples in an efficient and robust way. The diagnosis results are expressed in the manner of probability with the reliability scores. The proposed approach is evaluated with a benchmark problem called the Tennessee Eastman process. The simulation results show the effectiveness and robustness of the proposed method in fault detection and diagnosis for large-scale systems with missing measurements.展开更多
Complex industry processes often need multiple operation modes to meet the change of production conditions. In the same mode,there are discrete samples belonging to this mode. Therefore,it is important to consider the...Complex industry processes often need multiple operation modes to meet the change of production conditions. In the same mode,there are discrete samples belonging to this mode. Therefore,it is important to consider the samples which are sparse in the mode.To solve this issue,a new approach called density-based support vector data description( DBSVDD) is proposed. In this article,an algorithm using Gaussian mixture model( GMM) with the DBSVDD technique is proposed for process monitoring. The GMM method is used to obtain the center of each mode and determine the number of the modes. Considering the complexity of the data distribution and discrete samples in monitoring process,the DBSVDD is utilized for process monitoring. Finally,the validity and effectiveness of the DBSVDD method are illustrated through the Tennessee Eastman( TE) process.展开更多
随着数据维度和规模的不断增加,基于深度学习的异常检测方法取得了优异的检测性能,其中深度支持向量数据描述(Deep SVDD)得到了广泛应用。然而,要缓解超球崩溃问题,就需要对Deep SVDD中映射网络的各种参数施加约束。为了进一步提高Deep ...随着数据维度和规模的不断增加,基于深度学习的异常检测方法取得了优异的检测性能,其中深度支持向量数据描述(Deep SVDD)得到了广泛应用。然而,要缓解超球崩溃问题,就需要对Deep SVDD中映射网络的各种参数施加约束。为了进一步提高Deep SVDD中映射网络的特征学习能力,同时解决超球崩溃问题,提出了基于混合高斯先验变分自编码器的深度多球支持向量数据描述(Deep Multiple-Sphere Support Vector Data Description Based on Variational Autoencoder with Mixture-of-Gaussians Prior,DMSVDD-VAE-MoG)。首先,通过预训练初始化网络参数和多个超球中心;其次,利用映射网络获得训练数据的潜在特征,对VAE损失、多个超球的平均半径和潜在特征到所对应超球中心的平均距离进行联合优化,以获得最优网络连接权重和多个最小超球。实验结果表明,所提DMSVDD-VAE-MoG在MNIST,Fashion-MNIST和CIFAR-10上均取得了优于其他8种相关方法的检测性能。展开更多
文摘MODIS (Moderate Resolution Imaging Spectroradiometer) is a key instrument aboard the Terra (EOS AM) and Aqua (EOS PM) satellites. Linear spectral mixture models are applied to MOIDS data for the sub-pixel classification of land covers. Shaoxing county of Zhejiang Province in China was chosen to be the study site and early rice was selected as the study crop. The derived proportions of land covers from MODIS pixel using linear spectral mixture models were compared with unsupervised classification derived from TM data acquired on the same day, which implies that MODIS data could be used as satellite data source for rice cultivation area estimation, possibly rice growth monitoring and yield forecasting on the regional scale.
文摘Classical survival analysis assumes all subjects will experience the event of interest, but in some cases, a portion of the population may never encounter the event. These survival methods further assume independent survival times, which is not valid for honey bees, which live in nests. The study introduces a semi-parametric marginal proportional hazards mixture cure (PHMC) model with exchangeable correlation structure, using generalized estimating equations for survival data analysis. The model was tested on clustered right-censored bees survival data with a cured fraction, where two bee species were subjected to different entomopathogens to test the effect of the entomopathogens on the survival of the bee species. The Expectation-Solution algorithm is used to estimate the parameters. The study notes a weak positive association between cure statuses (ρ1=0.0007) and survival times for uncured bees (ρ2=0.0890), emphasizing their importance. The odds of being uncured for A. mellifera is higher than the odds for species M. ferruginea. The bee species, A. mellifera are more susceptible to entomopathogens icipe 7, icipe 20, and icipe 69. The Cox-Snell residuals show that the proposed semiparametric PH model generally fits the data well as compared to model that assume independent correlation structure. Thus, the semi parametric marginal proportional hazards mixture cure is parsimonious model for correlated bees survival data.
文摘Four empirical models are tested for fitting the T-y-x equilibrium data of ethanol-water mixture by minimizing the Root Mean Square (RMS) between equilibrium data and theoretical points. The total pressure of the correspondent data is 101.3 kPa. All models parameters are also identified. The study suggests that NRTL model fits the equilibrium data best with RMS = 0.4 %.
基金The US National Science Foundation (No. CMMI-0408390,CMMI-0644552)the American Chemical Society Petroleum Research Foundation (No.PRF-44468-G9)+3 种基金the Research Fellowship for International Young Scientists (No.51050110143)the Fok Ying-Tong Education Foundation (No.114024)the Natural Science Foundation of Jiangsu Province (No.BK2009015)the Postdoctoral Science Foundation of Jiangsu Province (No.0901005C)
文摘Based on Gaussian mixture models(GMM), speed, flow and occupancy are used together in the cluster analysis of traffic flow data. Compared with other clustering and sorting techniques, as a structural model, the GMM is suitable for various kinds of traffic flow parameters. Gap statistics and domain knowledge of traffic flow are used to determine a proper number of clusters. The expectation-maximization (E-M) algorithm is used to estimate parameters of the GMM model. The clustered traffic flow pattems are then analyzed statistically and utilized for designing maximum likelihood classifiers for grouping real-time traffic flow data when new observations become available. Clustering analysis and pattern recognition can also be used to cluster and classify dynamic traffic flow patterns for freeway on-ramp and off-ramp weaving sections as well as for other facilities or things involving the concept of level of service, such as airports, parking lots, intersections, interrupted-flow pedestrian facilities, etc.
基金the National Natural Science Foundation of China(61771367)the Science and Technology on Communication Networks Laboratory(HHS19641X003).
文摘Since the joint probabilistic data association(JPDA)algorithm results in calculation explosion with the increasing number of targets,a multi-target tracking algorithm based on Gaussian mixture model(GMM)clustering is proposed.The algorithm is used to cluster the measurements,and the association matrix between measurements and tracks is constructed by the posterior probability.Compared with the traditional data association algorithm,this algorithm has better tracking performance and less computational complexity.Simulation results demonstrate the effectiveness of the proposed algorithm.
基金supported by the National Key R.D Program of China(2021YFB2401904)the Joint Fund project of the National Natural Science Foundation of China(U21A20485)+1 种基金the National Natural Science Foundation of China(61976175)the Key Laboratory Project of Shaanxi Provincial Education Department Scientific Research Projects(20JS109)。
文摘The state of charge(SOC)estimation of lithium-ion battery is an important function in the battery management system(BMS)of electric vehicles.The long short term memory(LSTM)model can be employed for SOC estimation,which is capable of estimating the future changing states of a nonlinear system.Since the BMS usually works under complicated operating conditions,i.e the real measurement data used for model training may be corrupted by non-Gaussian noise,and thus the performance of the original LSTM with the mean square error(MSE)loss may deteriorate.Therefore,a novel LSTM with mixture kernel mean p-power error(MKMPE)loss,called MKMPE-LSTM,is developed by using the MKMPE loss to replace the MSE as the learning criterion in LSTM framework,which can achieve robust SOC estimation under the measurement data contaminated with non-Gaussian noises(or outliers)because of the MKMPE containing the p-order moments of the error distribution.In addition,a meta-heuristic algorithm,called heap-based-optimizer(HBO),is employed to optimize the hyper-parameters(mainly including learning rate,number of hidden layer neuron and value of p in MKMPE)of the proposed MKMPE-LSTM model to further improve its flexibility and generalization performance,and a novel hybrid model(HBO-MKMPE-LSTM)is established for SOC estimation under non-Gaussian noise cases.Finally,several tests are performed under various cases through a benchmark to evaluate the performance of the proposed HBO-MKMPE-LSTM model,and the results demonstrate that the proposed hybrid method can provide a good robustness and accuracy under different non-Gaussian measurement noises,and the SOC estimation results in terms of mean square error(MSE),root MSE(RMSE),mean absolute relative error(MARE),and determination coefficient R2are less than 0.05%,3%,3%,and above 99.8%at 25℃,respectively.
文摘It is difficult to measure the sizes of illegal drug user populations directly by using the survey method because of many “hidden drug addicts” and the difficulty of receiving a true response. Systematic and routine information on treatment episodes of drug users is adopted to estimate the population size in this study. Mixture models of zero-truncated Poisson distributions using the nonparametric maximum likelihood estimators (NPMLE) by means of capture-recapture repeated count data were used to project the number of drug users. The method was applied to surveillance data of drug users identified by treatment episodes in over 1140 health treatment centers in Thailand from the Bureau of Health Service System Development, Ministry of Public Health. We presented how this mixture model could be utilized to construct the unobserved frequency of drug users with no treatment episode and further estimated the total population size of drug users in the country from 2005 to 2007. The result of simulation was confirmed that mixture model is suitable when population is large. By means of mixture models, the estimations for the number of drug users were fitted with excellent goodness-of-fit values and we were also compared to the conventional Chao estimates. The NPMLE for the total number of drug users in Thailand 2005, 2006, and 2007 were 184,045 (95% CI: 181,297-86,793), 230,665 (95% CI: 226,611-234,719), 299,670 (95% CI: 294,217-305,123), respectively, also 125,265 (95% CI: 123,092-127,142), 166,287 (95% CI: 163,222-169,352), 228,898 (95% CI: 224,766 - 233,030) for the number of methamphetamine (Yaba) users, and 11,559 (95% CI: 10,234-12,884), 11,333 (95% CI: 9276-13,390), 8953 (95% CI: 7878-10,028) for the number of heroin users, respectively. The numbers of marijuana, kratom-plant, opium, and inhalant users were underestimated because their symptoms were mild and not severe enough to remedy in health treatment centers which led to the smaller size of the total number of drug users. The well-estimated sizes of heroin and methamphetamine addicts are high reliable because they are based on clearly evident count with a severe addiction problem to health treatment centers. The estimation by means of mixture models can be recommended to monitor drug demand trend and drug health service routinely;it is easy to calculate via the available programs MIXTP based on request.
文摘Cyber losses in terms of number of records breached under cyber incidents commonly feature a significant portion of zeros, specific characteristics of mid-range losses and large losses, which make it hard to model the whole range of the losses using a standard loss distribution. We tackle this modeling problem by proposing a three-component spliced regression model that can simultaneously model zeros, moderate and large losses and consider heterogeneous effects in mixture components. To apply our proposed model to Privacy Right Clearinghouse (PRC) data breach chronology, we segment geographical groups using unsupervised cluster analysis, and utilize a covariate-dependent probability to model zero losses, finite mixture distributions for moderate body and an extreme value distribution for large losses capturing the heavy-tailed nature of the loss data. Parameters and coefficients are estimated using the Expectation-Maximization (EM) algorithm. Combining with our frequency model (generalized linear mixed model) for data breaches, aggregate loss distributions are investigated and applications on cyber insurance pricing and risk management are discussed.
文摘In this paper, we provide a method based on quantiles to estimate the parameters of a finite mixture of Fréchet distributions, for a large sample of strongly dependent data. This is a situation that appears when dealing with environmental data and there was a real need of such method. We validate our approach by means of estimation and goodness-of-fit testing over simulated data, showing an accurate performance.
基金supported by the National Natural Science Foundation of China(61202473)the Fundamental Research Funds for Central Universities(JUSRP111A49)+1 种基金"111 Project"(B12018)the Priority Academic Program Development of Jiangsu Higher Education Institutions
文摘For the fault detection and diagnosis problem in largescale industrial systems, there are two important issues: the missing data samples and the non-Gaussian property of the data. However, most of the existing data-driven methods cannot be able to handle both of them. Thus, a new Bayesian network classifier based fault detection and diagnosis method is proposed. At first, a non-imputation method is presented to handle the data incomplete samples, with the property of the proposed Bayesian network classifier, and the missing values can be marginalized in an elegant manner. Furthermore, the Gaussian mixture model is used to approximate the non-Gaussian data with a linear combination of finite Gaussian mixtures, so that the Bayesian network can process the non-Gaussian data in an effective way. Therefore, the entire fault detection and diagnosis method can deal with the high-dimensional incomplete process samples in an efficient and robust way. The diagnosis results are expressed in the manner of probability with the reliability scores. The proposed approach is evaluated with a benchmark problem called the Tennessee Eastman process. The simulation results show the effectiveness and robustness of the proposed method in fault detection and diagnosis for large-scale systems with missing measurements.
基金National Natural Science Foundation of China(No.61374140)the Youth Foundation of National Natural Science Foundation of China(No.61403072)
文摘Complex industry processes often need multiple operation modes to meet the change of production conditions. In the same mode,there are discrete samples belonging to this mode. Therefore,it is important to consider the samples which are sparse in the mode.To solve this issue,a new approach called density-based support vector data description( DBSVDD) is proposed. In this article,an algorithm using Gaussian mixture model( GMM) with the DBSVDD technique is proposed for process monitoring. The GMM method is used to obtain the center of each mode and determine the number of the modes. Considering the complexity of the data distribution and discrete samples in monitoring process,the DBSVDD is utilized for process monitoring. Finally,the validity and effectiveness of the DBSVDD method are illustrated through the Tennessee Eastman( TE) process.
文摘随着数据维度和规模的不断增加,基于深度学习的异常检测方法取得了优异的检测性能,其中深度支持向量数据描述(Deep SVDD)得到了广泛应用。然而,要缓解超球崩溃问题,就需要对Deep SVDD中映射网络的各种参数施加约束。为了进一步提高Deep SVDD中映射网络的特征学习能力,同时解决超球崩溃问题,提出了基于混合高斯先验变分自编码器的深度多球支持向量数据描述(Deep Multiple-Sphere Support Vector Data Description Based on Variational Autoencoder with Mixture-of-Gaussians Prior,DMSVDD-VAE-MoG)。首先,通过预训练初始化网络参数和多个超球中心;其次,利用映射网络获得训练数据的潜在特征,对VAE损失、多个超球的平均半径和潜在特征到所对应超球中心的平均距离进行联合优化,以获得最优网络连接权重和多个最小超球。实验结果表明,所提DMSVDD-VAE-MoG在MNIST,Fashion-MNIST和CIFAR-10上均取得了优于其他8种相关方法的检测性能。