Based on Gaussian mixture models(GMM), speed, flow and occupancy are used together in the cluster analysis of traffic flow data. Compared with other clustering and sorting techniques, as a structural model, the GMM ...Based on Gaussian mixture models(GMM), speed, flow and occupancy are used together in the cluster analysis of traffic flow data. Compared with other clustering and sorting techniques, as a structural model, the GMM is suitable for various kinds of traffic flow parameters. Gap statistics and domain knowledge of traffic flow are used to determine a proper number of clusters. The expectation-maximization (E-M) algorithm is used to estimate parameters of the GMM model. The clustered traffic flow pattems are then analyzed statistically and utilized for designing maximum likelihood classifiers for grouping real-time traffic flow data when new observations become available. Clustering analysis and pattern recognition can also be used to cluster and classify dynamic traffic flow patterns for freeway on-ramp and off-ramp weaving sections as well as for other facilities or things involving the concept of level of service, such as airports, parking lots, intersections, interrupted-flow pedestrian facilities, etc.展开更多
Aiming at the problems that the classical Gaussian mixture model is unable to detect the complete moving object, and is sensitive to the light mutation scenes and so on, an improved algorithm is proposed for moving ob...Aiming at the problems that the classical Gaussian mixture model is unable to detect the complete moving object, and is sensitive to the light mutation scenes and so on, an improved algorithm is proposed for moving object detection based on Gaussian mixture model and three-frame difference method. In the process of extracting the moving region, the improved three-frame difference method uses the dynamic segmentation threshold and edge detection technology, and it is first used to solve the problems such as the illumination mutation and the discontinuity of the target edge. Then, a new adaptive selection strategy of the number of Gaussian distributions is introduced to reduce the processing time and improve accuracy of detection. Finally, HSV color space is used to remove shadow regions, and the whole moving object is detected. Experimental results show that the proposed algorithm can detect moving objects in various situations effectively.展开更多
Intrusion detection is the investigation process of information about the system activities or its data to detect any malicious behavior or unauthorized activity.Most of the IDS implement K-means clustering technique ...Intrusion detection is the investigation process of information about the system activities or its data to detect any malicious behavior or unauthorized activity.Most of the IDS implement K-means clustering technique due to its linear complexity and fast computing ability.Nonetheless,it is Naïve use of the mean data value for the cluster core that presents a major drawback.The chances of two circular clusters having different radius and centering at the same mean will occur.This condition cannot be addressed by the K-means algorithm because the mean value of the various clusters is very similar together.However,if the clusters are not spherical,it fails.To overcome this issue,a new integrated hybrid model by integrating expectation maximizing(EM)clustering using a Gaussian mixture model(GMM)and naïve Bays classifier have been proposed.In this model,GMM give more flexibility than K-Means in terms of cluster covariance.Also,they use probabilities function and soft clustering,that’s why they can have multiple cluster for a single data.In GMM,we can define the cluster form in GMM by two parameters:the mean and the standard deviation.This means that by using these two parameters,the cluster can take any kind of elliptical shape.EM-GMM will be used to cluster data based on data activity into the corresponding category.展开更多
Gaussian mixture models are classical but still popular machine learning models.An appealing feature of Gaussian mixture models is their tractability,that is,they can be learned efficiently and exactly from data,and a...Gaussian mixture models are classical but still popular machine learning models.An appealing feature of Gaussian mixture models is their tractability,that is,they can be learned efficiently and exactly from data,and also support efficient exact inference queries like soft clustering data points.Only seemingly simple,Gaussian mixture models can be hard to understand.There are at least four aspects to understanding Gaussian mixture models,namely,understanding the whole distribution,its individual parts(mixture components),the relationships between the parts,and the interplay of the whole and its parts.In a structured literature review of applications of Gaussian mixture models,we found the need for supporting all four aspects.To identify candidate visualizations that effectively aid the user needs,we structure the available design space along three different representations of Gaussian mixture models,namely as functions,sets of parameters,and sampling processes.From the design space,we implemented three design concepts that visualize the overall distribution together with its components.Finally,we assessed the practical usefulness of the design concepts with respect to the different user needs in expert interviews and an insight-based user study.展开更多
The diameter distribution function(DDF)is a crucial tool for accurately predicting stand carbon storage(CS).The current key issue,however,is how to construct a high-precision DDF based on stand factors,site quality,an...The diameter distribution function(DDF)is a crucial tool for accurately predicting stand carbon storage(CS).The current key issue,however,is how to construct a high-precision DDF based on stand factors,site quality,and aridity index to predict stand CS in multi-species mixed forests with complex structures.This study used data from70 survey plots for mixed broadleaf Populus davidiana and Betula platyphylla forests in the Mulan Rangeland State Forest,Hebei Province,China,to construct the DDF based on maximum likelihood estimation and finite mixture model(FMM).Ordinary least squares(OLS),linear seemingly unrelated regression(LSUR),and back propagation neural network(BPNN)were used to investigate the influences of stand factors,site quality,and aridity index on the shape and scale parameters of DDF and predicted stand CS of mixed broadleaf forests.The results showed that FMM accurately described the stand-level diameter distribution of the mixed P.davidiana and B.platyphylla forests;whereas the Weibull function constructed by MLE was more accurate in describing species-level diameter distribution.The combined variable of quadratic mean diameter(Dq),stand basal area(BA),and site quality improved the accuracy of the shape parameter models of FMM;the combined variable of Dq,BA,and De Martonne aridity index improved the accuracy of the scale parameter models.Compared to OLS and LSUR,the BPNN had higher accuracy in the re-parameterization process of FMM.OLS,LSUR,and BPNN overestimated the CS of P.davidiana but underestimated the CS of B.platyphylla in the large diameter classes(DBH≥18 cm).BPNN accurately estimated stand-and species-level CS,but it was more suitable for estimating stand-level CS compared to species-level CS,thereby providing a scientific basis for the optimization of stand structure and assessment of carbon sequestration capacity in mixed broadleaf forests.展开更多
Classical survival analysis assumes all subjects will experience the event of interest, but in some cases, a portion of the population may never encounter the event. These survival methods further assume independent s...Classical survival analysis assumes all subjects will experience the event of interest, but in some cases, a portion of the population may never encounter the event. These survival methods further assume independent survival times, which is not valid for honey bees, which live in nests. The study introduces a semi-parametric marginal proportional hazards mixture cure (PHMC) model with exchangeable correlation structure, using generalized estimating equations for survival data analysis. The model was tested on clustered right-censored bees survival data with a cured fraction, where two bee species were subjected to different entomopathogens to test the effect of the entomopathogens on the survival of the bee species. The Expectation-Solution algorithm is used to estimate the parameters. The study notes a weak positive association between cure statuses (ρ1=0.0007) and survival times for uncured bees (ρ2=0.0890), emphasizing their importance. The odds of being uncured for A. mellifera is higher than the odds for species M. ferruginea. The bee species, A. mellifera are more susceptible to entomopathogens icipe 7, icipe 20, and icipe 69. The Cox-Snell residuals show that the proposed semiparametric PH model generally fits the data well as compared to model that assume independent correlation structure. Thus, the semi parametric marginal proportional hazards mixture cure is parsimonious model for correlated bees survival data.展开更多
Background This paper presents a retrospective study to classify patients into subtypes of the treatment according to baseline and longitudinally observed values considering heterogenity in migraine prognosis. In the ...Background This paper presents a retrospective study to classify patients into subtypes of the treatment according to baseline and longitudinally observed values considering heterogenity in migraine prognosis. In the classical prospective clinical studies, participants are classified with respect to baseline status and followed within a certain time period. However, latent growth mixture model is the most suitable method, which considers the population heterogenity and is not affected drop-outs if they are missing at random. Hence, we planned this comprehensive study to identify prognostic factors in migraine.展开更多
Normal mixture regression models are one of the most important statistical data analysis tools in a heterogeneous population. When the data set under consideration involves asymmetric outcomes, in the last two decades...Normal mixture regression models are one of the most important statistical data analysis tools in a heterogeneous population. When the data set under consideration involves asymmetric outcomes, in the last two decades, the skew normal distribution has been shown beneficial in dealing with asymmetric data in various theoretic and applied problems. In this paper, we propose and study a novel class of models: a skew-normal mixture of joint location, scale and skewness models to analyze the heteroscedastic skew-normal data coming from a heterogeneous population. The issues of maximum likelihood estimation are addressed. In particular, an Expectation-Maximization (EM) algorithm for estimating the model parameters is developed. Properties of the estimators of the regression coefficients are evaluated through Monte Carlo experiments. Results from the analysis of a real data set from the Body Mass Index (BMI) data are presented.展开更多
Soil-rock mixture (SRM) is a unique type of geomaterial characterized by a heterogeneous composition and a complicated structure. It is intractable for the continuum-based soil and rock mechanics theories to accurat...Soil-rock mixture (SRM) is a unique type of geomaterial characterized by a heterogeneous composition and a complicated structure. It is intractable for the continuum-based soil and rock mechanics theories to accurately characterize and predict the SRM's mechanical properties. This study reports a novel numerical method incorporating microfocus computed tomography and PFC3D codes to probe the deformation and failure processes of SRM. The three-dimensional (3D) PFC models that represent the SRM's complex structures were built. By simulating the entire failure process in PFC3D, the SRM's strength, elastic modulus and crack growth were obtained. The influence of rock ratios on the SRM's strength, deformation and failure processes, as well as its internal mesoscale mechanism, were analyzed. By comparing simulation results with experimental data, it was verified that the 3D PFC models were in good agreement with SRM's real structure and the SRM's compression process, deformation and failure patterns; its intrinsic mesomechanism can be effectively analyzed based on such 3D PFC models.展开更多
In this paper, we consider the risk assessment problem under multi-levels and multiple mixture subpopulations. Our result is the generalization of the results of [1-5].1 Finite Mixture Normal ModelsIn dose-response s...In this paper, we consider the risk assessment problem under multi-levels and multiple mixture subpopulations. Our result is the generalization of the results of [1-5].1 Finite Mixture Normal ModelsIn dose-response studies, a class of phenomena that frequently occur are that experimental subjects (e.g., mice) may have different responses like ’none, mild, severe’ after a toxicant experiment, or ’getting worse, no change, getting better’ after a medical treatment, etc. These phenomena have attracted the attention of many researchers in recent years. Finite展开更多
The large blast furnace is essential equipment in the process of iron and steel manufacturing. Due to the complex operation process and frequent fluctuations of variables, conventional monitoring methods often bring f...The large blast furnace is essential equipment in the process of iron and steel manufacturing. Due to the complex operation process and frequent fluctuations of variables, conventional monitoring methods often bring false alarms. To address the above problem, an ensemble of greedy dynamic principal component analysis-Gaussian mixture model(EGDPCA-GMM) is proposed in this paper. First, PCA-GMM is introduced to deal with the collinearity and the non-Gaussian distribution of blast furnace data.Second, in order to explain the dynamics of data, the greedy algorithm is used to determine the extended variables and their corresponding time lags, so as to avoid introducing unnecessary noise. Then the bagging ensemble is adopted to cooperate with greedy extension to eliminate the randomness brought by the greedy algorithm and further reduce the false alarm rate(FAR) of monitoring results. Finally, the algorithm is applied to the blast furnace of a large iron and steel group in South China to verify performance.Compared with the basic algorithms, the proposed method achieves lowest FAR, while keeping missed alarm rate(MAR) remain stable.展开更多
The increasing penetration of highly intermittent wind generation could seriously jeopardize the operation reliability of power systems and increase the risk of electricity outages. To this end, this paper proposes a ...The increasing penetration of highly intermittent wind generation could seriously jeopardize the operation reliability of power systems and increase the risk of electricity outages. To this end, this paper proposes a novel data-driven method for operation risk assessment of wind-integrated power systems. Firstly, a new approach is presented to model the uncertainty of wind power in lead time. The proposed approach employs k-means clustering and mixture models(MMs) to construct time-dependent probability distributions of wind power.The proposed approach can also capture the complicated statistical features of wind power such as multimodality. Then, a nonsequential Monte Carlo simulation(NSMCS) technique is adopted to evaluate the operation risk indices. To improve the computation performance of NSMCS, a cross-entropy based importance sampling(CE-IS) technique is applied. The CE-IS technique is modified to include the proposed model of wind power.The method is validated on a modified IEEE 24-bus reliability test system(RTS) and a modified IEEE 3-area RTS while employing the historical data of wind generation. The simulation results verify the importance of accurate modeling of shortterm uncertainty of wind power for operation risk assessment.Further case studies have been performed to analyze the impact of transmission systems on operation risk indices. The computational performance of the framework is also examined.展开更多
Multivariate mixtures are encountered in situations where the data are repeated or clustered measurements in the presence of heterogeneity among the observations with unknown proportions.In such situations,the main in...Multivariate mixtures are encountered in situations where the data are repeated or clustered measurements in the presence of heterogeneity among the observations with unknown proportions.In such situations,the main interest may be not only in estimating the component parameters,but also in obtaining reliable estimates of the mixing proportions.In this paper,we propose an empirical likelihood approach combined with a novel dimension reduction procedure for estimating parameters of a two-component multivariate mixture model.The performance of the new method is compared to fully parametric as well as almost nonparametric methods used in the literature.展开更多
Purpose-For the large-scale power grid monitoring system equipment,its working environment is increasingly complex and the probability of fault or failure of the monitoring system is gradually increasing.This paper pr...Purpose-For the large-scale power grid monitoring system equipment,its working environment is increasingly complex and the probability of fault or failure of the monitoring system is gradually increasing.This paper proposes a fault classification algorithm based on Gaussian mixture model(GMM),which can complete the automatic classification of fault and the elimination of fault sources in the monitoring system.Design/methodology/approach-The algorithm first defines the GMM and obtains the detection value of the fault classification through a method based on the causal Mason Young Tracy(MYT)decomposition under each normal distribution in the GMM.Then,the weight value of GMM is used to calculate weighted classification value of fault detection and separation,and by comparing the actual control limits with the classification result of GMM,the fault classification results are obtained.Findings-The experiment on the defined non-thermostatic continuous stirred-tank reactor model shows that the algorithm proposed in this paper is superior to the traditional algorithm based on the causal MYT decomposition in fault detection and fault separation.Originality/value-The proposed algorithm fundamentally solves the problem of fault detection and fault separation in large-scale systems and provides support for troubleshooting and identifying fault sources.展开更多
Improvement of surface finish and material removal has been quite a challenge in a finishing operation such as abrasive flow machining (AFM). Factors that affect the surface finish and material removal are media vis...Improvement of surface finish and material removal has been quite a challenge in a finishing operation such as abrasive flow machining (AFM). Factors that affect the surface finish and material removal are media viscosity, extrusion pressure, piston velocity, and particle size in abrasive flow machining process. Performing experiments for all the parameters and accurately obtaining an optimized parameter in a short time are difficult to accomplish because the operation requires a precise finish. Computational fluid dynamics (CFD) simulation was employed to accurately determine optimum parameters. In the current work, a 2D model was designed, and the flow analysis, force calculation, and material removal prediction were performed and compared with the available experi- mental data. Another 3D model for a swaging die finishing using AFM was simulated at different viscosities of the media to study the effects on the controlling parameters. A CFD simulation was performed by using commercially available ANSYS FLUENT. Two phases were considered for the flow analysis, and multiphase mixture model was taken into account. The fluid was considered to be a Newtonian fluid and the flow laminar with no wall slip.展开更多
Mixture of Experts(MoE)regression models are widely studied in statistics and machine learning for modeling heterogeneity in data for regression,clustering and classification.Laplace distribution is one of the most im...Mixture of Experts(MoE)regression models are widely studied in statistics and machine learning for modeling heterogeneity in data for regression,clustering and classification.Laplace distribution is one of the most important statistical tools to analyze thick and tail data.Laplace Mixture of Linear Experts(LMoLE)regression models are based on the Laplace distribution which is more robust.Similar to modelling variance parameter in a homogeneous population,we propose and study a new novel class of models:heteroscedastic Laplace mixture of experts regression models to analyze the heteroscedastic data coming from a heterogeneous population in this paper.The issues of maximum likelihood estimation are addressed.In particular,Minorization-Maximization(MM)algorithm for estimating the regression parameters is developed.Properties of the estimators of the regression coefficients are evaluated through Monte Carlo simulations.Results from the analysis of two real data sets are presented.展开更多
Although there are many papers on variable selection methods based on mean model in the nite mixture of regression models,little work has been done on how to select signi cant explanatory variables in the modeling of ...Although there are many papers on variable selection methods based on mean model in the nite mixture of regression models,little work has been done on how to select signi cant explanatory variables in the modeling of the variance parameter.In this paper,we propose and study a novel class of models:a skew-normal mixture of joint location and scale models to analyze the heteroscedastic skew-normal data coming from a heterogeneous population.The problem of variable selection for the proposed models is considered.In particular,a modi ed Expectation-Maximization(EM)algorithm for estimating the model parameters is developed.The consistency and the oracle property of the penalized estimators is established.Simulation studies are conducted to investigate the nite sample performance of the proposed methodolo-gies.An example is illustrated by the proposed methodologies.展开更多
Background: Workplace violence (WV) towards psychiatric staff has commonly been associated with Posttraumatic Stress Disorder (PTSD). However, prospective studies have shown that not all psychiatric staff who experien...Background: Workplace violence (WV) towards psychiatric staff has commonly been associated with Posttraumatic Stress Disorder (PTSD). However, prospective studies have shown that not all psychiatric staff who experience workplace violence experience post-traumatic stress. Purpose: We want to examine the longitudinal trajectories of PTSD in this population to identify possible subgroups that might be more at risk. Furthermore, we need to investigate whether certain risk factors of PTSD might identify membership in the subgroups. Method: In a sample of psychiatric staff from 18 psychiatric wards in Denmark who had reported an incident of WV, we used Latent Growth Mixture Modelling (LGMM) and further logistic regression analysis to investigate this. Results: We found three separate PTSD trajectories: a recovering, a delayed-onset, and a moderate-stable trajectory. Higher social support and negative cognitive appraisals about oneself, the world and self-blame predicted membership in the delayed-onset trajectory, while higher social support and lower accept coping predicted membership in the delayed-onset trajectory. Conclusion: Although most psychiatric staff go through a natural recovery, it is important to be aware of and identify staff members who might be struggling long-term. More focus on the factors that might predict these groups should be an important task for psychiatric departments to prevent posttraumatic symptomatology from work.展开更多
A cascaded projection of the Gaussian mixture model algorithm is proposed.First,the marginal distribution of the Gaussian mixture model is computed for different feature dimensions, and a number of sub-classifiers are...A cascaded projection of the Gaussian mixture model algorithm is proposed.First,the marginal distribution of the Gaussian mixture model is computed for different feature dimensions, and a number of sub-classifiers are generated using the marginal distribution model.Each sub-classifier is based on different feature sets.The cascaded structure is adopted to fuse the sub-classifiers dynamically to achieve sample adaptation ability.Secondly,the effectiveness of the proposed algorithm is verified on electrocardiogram emotional signal and speech emotional signal.Emotional data including fidgetiness,happiness and sadness is collected by induction experiments.Finally,the emotion feature extraction method is discussed,including heart rate variability, the chaotic electrocardiogram feature and utterance level static feature.The emotional feature reduction methods are studied, including principle component analysis,sequential forward selection, the Fisher discriminant ratio and maximal information coefficient.The experimental results show that the proposed classification algorithm can effectively improve recognition accuracy in two different scenarios.展开更多
A new two-step framework is proposed for image segmentation. In the first step, the gray-value distribution of the given image is reshaped to have larger inter-class variance and less intra-class variance. In the sec-...A new two-step framework is proposed for image segmentation. In the first step, the gray-value distribution of the given image is reshaped to have larger inter-class variance and less intra-class variance. In the sec- ond step, the discriminant-based methods or clustering-based methods are performed on the reformed distribution. It is focused on the typical clustering methods-Gaussian mixture model (GMM) and its variant to demonstrate the feasibility of the framework. Due to the independence of the first step in its second step, it can be integrated into the pixel-based and the histogram-based methods to improve their segmentation quality. The experiments on artificial and real images show that the framework can achieve effective and robust segmentation results.展开更多
基金The US National Science Foundation (No. CMMI-0408390,CMMI-0644552)the American Chemical Society Petroleum Research Foundation (No.PRF-44468-G9)+3 种基金the Research Fellowship for International Young Scientists (No.51050110143)the Fok Ying-Tong Education Foundation (No.114024)the Natural Science Foundation of Jiangsu Province (No.BK2009015)the Postdoctoral Science Foundation of Jiangsu Province (No.0901005C)
文摘Based on Gaussian mixture models(GMM), speed, flow and occupancy are used together in the cluster analysis of traffic flow data. Compared with other clustering and sorting techniques, as a structural model, the GMM is suitable for various kinds of traffic flow parameters. Gap statistics and domain knowledge of traffic flow are used to determine a proper number of clusters. The expectation-maximization (E-M) algorithm is used to estimate parameters of the GMM model. The clustered traffic flow pattems are then analyzed statistically and utilized for designing maximum likelihood classifiers for grouping real-time traffic flow data when new observations become available. Clustering analysis and pattern recognition can also be used to cluster and classify dynamic traffic flow patterns for freeway on-ramp and off-ramp weaving sections as well as for other facilities or things involving the concept of level of service, such as airports, parking lots, intersections, interrupted-flow pedestrian facilities, etc.
文摘Aiming at the problems that the classical Gaussian mixture model is unable to detect the complete moving object, and is sensitive to the light mutation scenes and so on, an improved algorithm is proposed for moving object detection based on Gaussian mixture model and three-frame difference method. In the process of extracting the moving region, the improved three-frame difference method uses the dynamic segmentation threshold and edge detection technology, and it is first used to solve the problems such as the illumination mutation and the discontinuity of the target edge. Then, a new adaptive selection strategy of the number of Gaussian distributions is introduced to reduce the processing time and improve accuracy of detection. Finally, HSV color space is used to remove shadow regions, and the whole moving object is detected. Experimental results show that the proposed algorithm can detect moving objects in various situations effectively.
文摘Intrusion detection is the investigation process of information about the system activities or its data to detect any malicious behavior or unauthorized activity.Most of the IDS implement K-means clustering technique due to its linear complexity and fast computing ability.Nonetheless,it is Naïve use of the mean data value for the cluster core that presents a major drawback.The chances of two circular clusters having different radius and centering at the same mean will occur.This condition cannot be addressed by the K-means algorithm because the mean value of the various clusters is very similar together.However,if the clusters are not spherical,it fails.To overcome this issue,a new integrated hybrid model by integrating expectation maximizing(EM)clustering using a Gaussian mixture model(GMM)and naïve Bays classifier have been proposed.In this model,GMM give more flexibility than K-Means in terms of cluster covariance.Also,they use probabilities function and soft clustering,that’s why they can have multiple cluster for a single data.In GMM,we can define the cluster form in GMM by two parameters:the mean and the standard deviation.This means that by using these two parameters,the cluster can take any kind of elliptical shape.EM-GMM will be used to cluster data based on data activity into the corresponding category.
基金Carl Zeiss Foundation,within the project Interactive Inference.
文摘Gaussian mixture models are classical but still popular machine learning models.An appealing feature of Gaussian mixture models is their tractability,that is,they can be learned efficiently and exactly from data,and also support efficient exact inference queries like soft clustering data points.Only seemingly simple,Gaussian mixture models can be hard to understand.There are at least four aspects to understanding Gaussian mixture models,namely,understanding the whole distribution,its individual parts(mixture components),the relationships between the parts,and the interplay of the whole and its parts.In a structured literature review of applications of Gaussian mixture models,we found the need for supporting all four aspects.To identify candidate visualizations that effectively aid the user needs,we structure the available design space along three different representations of Gaussian mixture models,namely as functions,sets of parameters,and sampling processes.From the design space,we implemented three design concepts that visualize the overall distribution together with its components.Finally,we assessed the practical usefulness of the design concepts with respect to the different user needs in expert interviews and an insight-based user study.
基金funded by the National Key Research and Development Program of China(No.2022YFD2200503-02)。
文摘The diameter distribution function(DDF)is a crucial tool for accurately predicting stand carbon storage(CS).The current key issue,however,is how to construct a high-precision DDF based on stand factors,site quality,and aridity index to predict stand CS in multi-species mixed forests with complex structures.This study used data from70 survey plots for mixed broadleaf Populus davidiana and Betula platyphylla forests in the Mulan Rangeland State Forest,Hebei Province,China,to construct the DDF based on maximum likelihood estimation and finite mixture model(FMM).Ordinary least squares(OLS),linear seemingly unrelated regression(LSUR),and back propagation neural network(BPNN)were used to investigate the influences of stand factors,site quality,and aridity index on the shape and scale parameters of DDF and predicted stand CS of mixed broadleaf forests.The results showed that FMM accurately described the stand-level diameter distribution of the mixed P.davidiana and B.platyphylla forests;whereas the Weibull function constructed by MLE was more accurate in describing species-level diameter distribution.The combined variable of quadratic mean diameter(Dq),stand basal area(BA),and site quality improved the accuracy of the shape parameter models of FMM;the combined variable of Dq,BA,and De Martonne aridity index improved the accuracy of the scale parameter models.Compared to OLS and LSUR,the BPNN had higher accuracy in the re-parameterization process of FMM.OLS,LSUR,and BPNN overestimated the CS of P.davidiana but underestimated the CS of B.platyphylla in the large diameter classes(DBH≥18 cm).BPNN accurately estimated stand-and species-level CS,but it was more suitable for estimating stand-level CS compared to species-level CS,thereby providing a scientific basis for the optimization of stand structure and assessment of carbon sequestration capacity in mixed broadleaf forests.
文摘Classical survival analysis assumes all subjects will experience the event of interest, but in some cases, a portion of the population may never encounter the event. These survival methods further assume independent survival times, which is not valid for honey bees, which live in nests. The study introduces a semi-parametric marginal proportional hazards mixture cure (PHMC) model with exchangeable correlation structure, using generalized estimating equations for survival data analysis. The model was tested on clustered right-censored bees survival data with a cured fraction, where two bee species were subjected to different entomopathogens to test the effect of the entomopathogens on the survival of the bee species. The Expectation-Solution algorithm is used to estimate the parameters. The study notes a weak positive association between cure statuses (ρ1=0.0007) and survival times for uncured bees (ρ2=0.0890), emphasizing their importance. The odds of being uncured for A. mellifera is higher than the odds for species M. ferruginea. The bee species, A. mellifera are more susceptible to entomopathogens icipe 7, icipe 20, and icipe 69. The Cox-Snell residuals show that the proposed semiparametric PH model generally fits the data well as compared to model that assume independent correlation structure. Thus, the semi parametric marginal proportional hazards mixture cure is parsimonious model for correlated bees survival data.
文摘Background This paper presents a retrospective study to classify patients into subtypes of the treatment according to baseline and longitudinally observed values considering heterogenity in migraine prognosis. In the classical prospective clinical studies, participants are classified with respect to baseline status and followed within a certain time period. However, latent growth mixture model is the most suitable method, which considers the population heterogenity and is not affected drop-outs if they are missing at random. Hence, we planned this comprehensive study to identify prognostic factors in migraine.
基金Supported by the National Natural Science Foundation of China(11261025,11561075)the Natural Science Foundation of Yunnan Province(2016FB005)the Program for Middle-aged Backbone Teacher,Yunnan University
文摘Normal mixture regression models are one of the most important statistical data analysis tools in a heterogeneous population. When the data set under consideration involves asymmetric outcomes, in the last two decades, the skew normal distribution has been shown beneficial in dealing with asymmetric data in various theoretic and applied problems. In this paper, we propose and study a novel class of models: a skew-normal mixture of joint location, scale and skewness models to analyze the heteroscedastic skew-normal data coming from a heterogeneous population. The issues of maximum likelihood estimation are addressed. In particular, an Expectation-Maximization (EM) algorithm for estimating the model parameters is developed. Properties of the estimators of the regression coefficients are evaluated through Monte Carlo experiments. Results from the analysis of a real data set from the Body Mass Index (BMI) data are presented.
基金Acknowledgements The authors gratefully acknowledge the financial support from the State Key Research Development Program of China (Grant No. 2016YFC0600705), the National Natural Science Foundation of China (Grant Nos. 51674251, 51727807, 51374213), the National Natural Science Foundation for Distinguished Young Scholars of China (Grant No. 51125017), the Fund for Creative Research and Development Group Program of Jiangsu Province (Grant No. 2014-27), and the Priority Academic Program Development of Jiangsu Higher Education Institutions (Grant No. PAPD2014), and an open project sponsored by the State Key Labo- ratory for Geomechanics and Deep Underground Engineering (Grant SKLGDUE K1318) for their financial support.
文摘Soil-rock mixture (SRM) is a unique type of geomaterial characterized by a heterogeneous composition and a complicated structure. It is intractable for the continuum-based soil and rock mechanics theories to accurately characterize and predict the SRM's mechanical properties. This study reports a novel numerical method incorporating microfocus computed tomography and PFC3D codes to probe the deformation and failure processes of SRM. The three-dimensional (3D) PFC models that represent the SRM's complex structures were built. By simulating the entire failure process in PFC3D, the SRM's strength, elastic modulus and crack growth were obtained. The influence of rock ratios on the SRM's strength, deformation and failure processes, as well as its internal mesoscale mechanism, were analyzed. By comparing simulation results with experimental data, it was verified that the 3D PFC models were in good agreement with SRM's real structure and the SRM's compression process, deformation and failure patterns; its intrinsic mesomechanism can be effectively analyzed based on such 3D PFC models.
文摘In this paper, we consider the risk assessment problem under multi-levels and multiple mixture subpopulations. Our result is the generalization of the results of [1-5].1 Finite Mixture Normal ModelsIn dose-response studies, a class of phenomena that frequently occur are that experimental subjects (e.g., mice) may have different responses like ’none, mild, severe’ after a toxicant experiment, or ’getting worse, no change, getting better’ after a medical treatment, etc. These phenomena have attracted the attention of many researchers in recent years. Finite
基金supported by the National Natural Science Foundation of China (61903326, 61933015)。
文摘The large blast furnace is essential equipment in the process of iron and steel manufacturing. Due to the complex operation process and frequent fluctuations of variables, conventional monitoring methods often bring false alarms. To address the above problem, an ensemble of greedy dynamic principal component analysis-Gaussian mixture model(EGDPCA-GMM) is proposed in this paper. First, PCA-GMM is introduced to deal with the collinearity and the non-Gaussian distribution of blast furnace data.Second, in order to explain the dynamics of data, the greedy algorithm is used to determine the extended variables and their corresponding time lags, so as to avoid introducing unnecessary noise. Then the bagging ensemble is adopted to cooperate with greedy extension to eliminate the randomness brought by the greedy algorithm and further reduce the false alarm rate(FAR) of monitoring results. Finally, the algorithm is applied to the blast furnace of a large iron and steel group in South China to verify performance.Compared with the basic algorithms, the proposed method achieves lowest FAR, while keeping missed alarm rate(MAR) remain stable.
基金supported in part by the Natural Sciences and Engineering Research Council (NSERC) of Canadathe Saskatchewan Power Corporation(SaskPower)。
文摘The increasing penetration of highly intermittent wind generation could seriously jeopardize the operation reliability of power systems and increase the risk of electricity outages. To this end, this paper proposes a novel data-driven method for operation risk assessment of wind-integrated power systems. Firstly, a new approach is presented to model the uncertainty of wind power in lead time. The proposed approach employs k-means clustering and mixture models(MMs) to construct time-dependent probability distributions of wind power.The proposed approach can also capture the complicated statistical features of wind power such as multimodality. Then, a nonsequential Monte Carlo simulation(NSMCS) technique is adopted to evaluate the operation risk indices. To improve the computation performance of NSMCS, a cross-entropy based importance sampling(CE-IS) technique is applied. The CE-IS technique is modified to include the proposed model of wind power.The method is validated on a modified IEEE 24-bus reliability test system(RTS) and a modified IEEE 3-area RTS while employing the historical data of wind generation. The simulation results verify the importance of accurate modeling of shortterm uncertainty of wind power for operation risk assessment.Further case studies have been performed to analyze the impact of transmission systems on operation risk indices. The computational performance of the framework is also examined.
基金partially supported by the Natural Sciences and Engineering Research Council of Canada(NSERC)Discovery Grants(RGPIN-2018-05846,RGPIN-2018-05981)the National Natural Science Foundation of China(Grant Numbers 11771144,11501354 and 11501208)the Chinese 111 Project(B14019).
文摘Multivariate mixtures are encountered in situations where the data are repeated or clustered measurements in the presence of heterogeneity among the observations with unknown proportions.In such situations,the main interest may be not only in estimating the component parameters,but also in obtaining reliable estimates of the mixing proportions.In this paper,we propose an empirical likelihood approach combined with a novel dimension reduction procedure for estimating parameters of a two-component multivariate mixture model.The performance of the new method is compared to fully parametric as well as almost nonparametric methods used in the literature.
文摘Purpose-For the large-scale power grid monitoring system equipment,its working environment is increasingly complex and the probability of fault or failure of the monitoring system is gradually increasing.This paper proposes a fault classification algorithm based on Gaussian mixture model(GMM),which can complete the automatic classification of fault and the elimination of fault sources in the monitoring system.Design/methodology/approach-The algorithm first defines the GMM and obtains the detection value of the fault classification through a method based on the causal Mason Young Tracy(MYT)decomposition under each normal distribution in the GMM.Then,the weight value of GMM is used to calculate weighted classification value of fault detection and separation,and by comparing the actual control limits with the classification result of GMM,the fault classification results are obtained.Findings-The experiment on the defined non-thermostatic continuous stirred-tank reactor model shows that the algorithm proposed in this paper is superior to the traditional algorithm based on the causal MYT decomposition in fault detection and fault separation.Originality/value-The proposed algorithm fundamentally solves the problem of fault detection and fault separation in large-scale systems and provides support for troubleshooting and identifying fault sources.
文摘Improvement of surface finish and material removal has been quite a challenge in a finishing operation such as abrasive flow machining (AFM). Factors that affect the surface finish and material removal are media viscosity, extrusion pressure, piston velocity, and particle size in abrasive flow machining process. Performing experiments for all the parameters and accurately obtaining an optimized parameter in a short time are difficult to accomplish because the operation requires a precise finish. Computational fluid dynamics (CFD) simulation was employed to accurately determine optimum parameters. In the current work, a 2D model was designed, and the flow analysis, force calculation, and material removal prediction were performed and compared with the available experi- mental data. Another 3D model for a swaging die finishing using AFM was simulated at different viscosities of the media to study the effects on the controlling parameters. A CFD simulation was performed by using commercially available ANSYS FLUENT. Two phases were considered for the flow analysis, and multiphase mixture model was taken into account. The fluid was considered to be a Newtonian fluid and the flow laminar with no wall slip.
基金the National Natural Science Foundation of China(11861041,11261025).
文摘Mixture of Experts(MoE)regression models are widely studied in statistics and machine learning for modeling heterogeneity in data for regression,clustering and classification.Laplace distribution is one of the most important statistical tools to analyze thick and tail data.Laplace Mixture of Linear Experts(LMoLE)regression models are based on the Laplace distribution which is more robust.Similar to modelling variance parameter in a homogeneous population,we propose and study a new novel class of models:heteroscedastic Laplace mixture of experts regression models to analyze the heteroscedastic data coming from a heterogeneous population in this paper.The issues of maximum likelihood estimation are addressed.In particular,Minorization-Maximization(MM)algorithm for estimating the regression parameters is developed.Properties of the estimators of the regression coefficients are evaluated through Monte Carlo simulations.Results from the analysis of two real data sets are presented.
基金Supported by the National Natural Science Foundation of China(11861041).
文摘Although there are many papers on variable selection methods based on mean model in the nite mixture of regression models,little work has been done on how to select signi cant explanatory variables in the modeling of the variance parameter.In this paper,we propose and study a novel class of models:a skew-normal mixture of joint location and scale models to analyze the heteroscedastic skew-normal data coming from a heterogeneous population.The problem of variable selection for the proposed models is considered.In particular,a modi ed Expectation-Maximization(EM)algorithm for estimating the model parameters is developed.The consistency and the oracle property of the penalized estimators is established.Simulation studies are conducted to investigate the nite sample performance of the proposed methodolo-gies.An example is illustrated by the proposed methodologies.
文摘Background: Workplace violence (WV) towards psychiatric staff has commonly been associated with Posttraumatic Stress Disorder (PTSD). However, prospective studies have shown that not all psychiatric staff who experience workplace violence experience post-traumatic stress. Purpose: We want to examine the longitudinal trajectories of PTSD in this population to identify possible subgroups that might be more at risk. Furthermore, we need to investigate whether certain risk factors of PTSD might identify membership in the subgroups. Method: In a sample of psychiatric staff from 18 psychiatric wards in Denmark who had reported an incident of WV, we used Latent Growth Mixture Modelling (LGMM) and further logistic regression analysis to investigate this. Results: We found three separate PTSD trajectories: a recovering, a delayed-onset, and a moderate-stable trajectory. Higher social support and negative cognitive appraisals about oneself, the world and self-blame predicted membership in the delayed-onset trajectory, while higher social support and lower accept coping predicted membership in the delayed-onset trajectory. Conclusion: Although most psychiatric staff go through a natural recovery, it is important to be aware of and identify staff members who might be struggling long-term. More focus on the factors that might predict these groups should be an important task for psychiatric departments to prevent posttraumatic symptomatology from work.
基金The National Natural Science Foundation of China(No.61231002,61273266,51075068,61271359)Doctoral Fund of Ministry of Education of China(No.20110092130004)
文摘A cascaded projection of the Gaussian mixture model algorithm is proposed.First,the marginal distribution of the Gaussian mixture model is computed for different feature dimensions, and a number of sub-classifiers are generated using the marginal distribution model.Each sub-classifier is based on different feature sets.The cascaded structure is adopted to fuse the sub-classifiers dynamically to achieve sample adaptation ability.Secondly,the effectiveness of the proposed algorithm is verified on electrocardiogram emotional signal and speech emotional signal.Emotional data including fidgetiness,happiness and sadness is collected by induction experiments.Finally,the emotion feature extraction method is discussed,including heart rate variability, the chaotic electrocardiogram feature and utterance level static feature.The emotional feature reduction methods are studied, including principle component analysis,sequential forward selection, the Fisher discriminant ratio and maximal information coefficient.The experimental results show that the proposed classification algorithm can effectively improve recognition accuracy in two different scenarios.
基金Supported by the National Natural Science Foundation of China(60505004,60773061)~~
文摘A new two-step framework is proposed for image segmentation. In the first step, the gray-value distribution of the given image is reshaped to have larger inter-class variance and less intra-class variance. In the sec- ond step, the discriminant-based methods or clustering-based methods are performed on the reformed distribution. It is focused on the typical clustering methods-Gaussian mixture model (GMM) and its variant to demonstrate the feasibility of the framework. Due to the independence of the first step in its second step, it can be integrated into the pixel-based and the histogram-based methods to improve their segmentation quality. The experiments on artificial and real images show that the framework can achieve effective and robust segmentation results.