Based on Gaussian mixture models(GMM), speed, flow and occupancy are used together in the cluster analysis of traffic flow data. Compared with other clustering and sorting techniques, as a structural model, the GMM ...Based on Gaussian mixture models(GMM), speed, flow and occupancy are used together in the cluster analysis of traffic flow data. Compared with other clustering and sorting techniques, as a structural model, the GMM is suitable for various kinds of traffic flow parameters. Gap statistics and domain knowledge of traffic flow are used to determine a proper number of clusters. The expectation-maximization (E-M) algorithm is used to estimate parameters of the GMM model. The clustered traffic flow pattems are then analyzed statistically and utilized for designing maximum likelihood classifiers for grouping real-time traffic flow data when new observations become available. Clustering analysis and pattern recognition can also be used to cluster and classify dynamic traffic flow patterns for freeway on-ramp and off-ramp weaving sections as well as for other facilities or things involving the concept of level of service, such as airports, parking lots, intersections, interrupted-flow pedestrian facilities, etc.展开更多
Due to the fact that the emergency medicine distribution is vital to the quick response to urgent demand when an epidemic occurs, the optimal vaccine distribution approach is explored according to the epidemic diffusi...Due to the fact that the emergency medicine distribution is vital to the quick response to urgent demand when an epidemic occurs, the optimal vaccine distribution approach is explored according to the epidemic diffusion rule and different urgency degrees of affected areas with the background of the epidemic outbreak in a given region. First, the SIQR (susceptible, infected, quarantined,recovered) epidemic model with pulse vaccination is introduced to describe the epidemic diffusion rule and obtain the demanded vaccine in each pulse. Based on the SIQR model, the affected areas are clustered by using the self-organizing map (SOM) neutral network to qualify the results. Then, a dynamic vaccine distribution model is formulated, incorporating the results of clustering the affected areas with the goals of both reducing the transportation cost and decreasing the unsatisfied demand for the emergency logistics network. Numerical study with twenty affected areas and four distribution centers is carried out. The corresponding numerical results indicate that the proposed approach can make an outstanding contribution to controlling the affected areas with a relatively high degree of urgency, and the comparison results prove that the performance of the clustering method is superior to that of the non-clustering method on controlling epidemic diffusion.展开更多
Reduced order models(ROMs) based on the snapshots on the CFD high-fidelity simulations have been paid great attention recently due to their capability of capturing the features of the complex geometries and flow con...Reduced order models(ROMs) based on the snapshots on the CFD high-fidelity simulations have been paid great attention recently due to their capability of capturing the features of the complex geometries and flow configurations. To improve the efficiency and precision of the ROMs, it is indispensable to add extra sampling points to the initial snapshots, since the number of sampling points to achieve an adequately accurate ROM is generally unknown in prior, but a large number of initial sampling points reduces the parsimony of the ROMs. A fuzzy-clustering-based adding-point strategy is proposed and the fuzzy clustering acts an indicator of the region in which the precision of ROMs is relatively low. The proposed method is applied to construct the ROMs for the benchmark mathematical examples and a numerical example of hypersonic aerothermodynamics prediction for a typical control surface. The proposed method can achieve a 34.5% improvement on the efficiency than the estimated mean squared error prediction algorithm and shows same-level prediction accuracy.展开更多
Cluster-based channel model is the main stream of fifth generation mobile communications, thus the accuracy of clustering algorithm is important. Traditional Gaussian mixture model (GMM) does not consider the power in...Cluster-based channel model is the main stream of fifth generation mobile communications, thus the accuracy of clustering algorithm is important. Traditional Gaussian mixture model (GMM) does not consider the power information which is important for the channel multipath clustering. In this paper, a normalized power weighted GMM (PGMM) is introduced to model the channel multipath components (MPCs). With MPC power as a weighted factor, the PGMM can fit the MPCs in accordance with the cluster-based channel models. Firstly, expectation maximization (EM) algorithm is employed to optimize the PGMM parameters. Then, to further increase the searching ability of EM and choose the optimal number of components without resort to cross-validation, the variational Bayesian (VB) inference is employed. Finally, 28 GHz indoor channel measurement data is used to demonstrate the effectiveness of the PGMM clustering algorithm.展开更多
A scheme for an automatic road surface modeling from a noisy point cloud is presented. The normal vectors of the point cloud are estimated by distance-weighted fitting of local plane. Then, an automatic recognition of...A scheme for an automatic road surface modeling from a noisy point cloud is presented. The normal vectors of the point cloud are estimated by distance-weighted fitting of local plane. Then, an automatic recognition of the road surface from noise is performed based on the fuzzy clustering of normal vectors, with which the mean value is calculated and the projecting plane of point cloud is created to obtain the geometric model accordingly. Based on fuzzy clustering of the intensity attributed to each point, different objects on the road surface are assigned different colors for representing abundant appearances. This unsupervised method is demonstrated in the experiment and shows great effectiveness in reconstructing and rendering better road surface.展开更多
Most real application processes belong to a complex nonlinear system with incomplete information. It is difficult to estimate a model by assuming that the data set is governed by a global model. Moreover, in real proc...Most real application processes belong to a complex nonlinear system with incomplete information. It is difficult to estimate a model by assuming that the data set is governed by a global model. Moreover, in real processes, the available data set is usually obtained with missing values. To overcome the shortcomings of global modeling and missing data values, a new modeling method is proposed. Firstly, an incomplete data set with missing values is partitioned into several clusters by a K-means with soft constraints (KSC) algorithm, which incorporates soft constraints to enable clustering with missing values. Then a local model based on each group is developed by using SVR algorithm, which adopts a missing value insensitive (MVI) kernel to investigate the missing value estimation problem. For each local model, its valid area is gotten as well. Simulation results prove the effectiveness of the current local model and the estimation algorithm.展开更多
Mean-variance portfolio optimization models are sensitive to uncertainty in risk-return estimates,which may result in poor out-of-sample performance.In particular,the estimates may suffer when the number of assets con...Mean-variance portfolio optimization models are sensitive to uncertainty in risk-return estimates,which may result in poor out-of-sample performance.In particular,the estimates may suffer when the number of assets considered is high and the length of the return time series is not sufficiently long.This is precisely the case in the cryptocur-rency market,where there are hundreds of crypto assets that have been traded for a few years.We propose enhancing the mean-variance(MV)model with a pre-selection stage that uses a prototype-based clustering algorithm to reduce the number of crypto assets considered at each investment period.In the pre-selection stage,we run a prototype-based clustering algorithm where the assets are described by variables representing the profit-risk duality.The prototypes of the clustering partition are auto-matically examined and the one that best suits our risk-aversion preference is selected.We then run the MV portfolio optimization with the crypto assets of the selected cluster.The proposed approach is tested for a period of 17 months in the whole cryp-tocurrency market and two selections of the cryptocurrencies with the higher market capitalization(175 and 250 cryptos).We compare the results against three methods applied to the whole market:classic MV,risk parity,and hierarchical risk parity methods.We also compare our results with those from investing in the market index CCI30.The simulation results generally favor our proposal in terms of profit and risk-profit financial indicators.This result reaffirms the convenience of using machine learning methods to guide financial investments in complex and highly-volatile environments such as the cryptocurrency market.展开更多
In order to improve the accuracy and efficiency of 3D model retrieval,the method based on affinity propagation clustering algorithm is proposed. Firstly,projection ray-based method is proposed to improve the feature e...In order to improve the accuracy and efficiency of 3D model retrieval,the method based on affinity propagation clustering algorithm is proposed. Firstly,projection ray-based method is proposed to improve the feature extraction efficiency of 3D models. Based on the relationship between model and its projection,the intersection in 3D space is transformed into intersection in 2D space,which reduces the number of intersection and improves the efficiency of the extraction algorithm. In feature extraction,multi-layer spheres method is analyzed. The two-layer spheres method makes the feature vector more accurate and improves retrieval precision. Secondly,Semi-supervised Affinity Propagation ( S-AP) clustering is utilized because it can be applied to different cluster structures. The S-AP algorithm is adopted to find the center models and then the center model collection is built. During retrieval process,the collection is utilized to classify the query model into corresponding model base and then the most similar model is retrieved in the model base. Finally,75 sample models from Princeton library are selected to do the experiment and then 36 models are used for retrieval test. The results validate that the proposed method outperforms the original method and the retrieval precision and recall ratios are improved effectively.展开更多
To overcome the limitation of the traditional clustering algorithms which fail to produce meaningful clusters in high-dimensional, sparseness and binary value data sets, a new method based on hypergraph model is propo...To overcome the limitation of the traditional clustering algorithms which fail to produce meaningful clusters in high-dimensional, sparseness and binary value data sets, a new method based on hypergraph model is proposed. The hypergraph model maps the relationship present in the original data in high dimensional space into a hypergraph. A hyperedge represents the similarity of attrlbute-value distribution between two points. A hypergraph partitioning algorithm is used to find a partitioning of the vertices such that the corresponding data items in each partition are highly related and the weight of the hyperedges cut by the partitioning is minimized. The quality of the clustering result can be evaluated by applying the intra-cluster singularity value. Analysis and experimental results have demonstrated that this approach is applicable and effective in wide ranging scheme.展开更多
To make the quantitative results of nuclear magnetic resonance(NMR) transverse relaxation(T;) spectrums reflect the type and pore structure of reservoir more directly, an unsupervised clustering method was developed t...To make the quantitative results of nuclear magnetic resonance(NMR) transverse relaxation(T;) spectrums reflect the type and pore structure of reservoir more directly, an unsupervised clustering method was developed to obtain the quantitative pore structure information from the NMR T;spectrums based on the Gaussian mixture model(GMM). Firstly, We conducted the principal component analysis on T;spectrums in order to reduce the dimension data and the dependence of the original variables. Secondly, the dimension-reduced data was fitted using the GMM probability density function, and the model parameters and optimal clustering numbers were obtained according to the expectation-maximization algorithm and the change of the Akaike information criterion. Finally, the T;spectrum features and pore structure types of different clustering groups were analyzed and compared with T;geometric mean and T;arithmetic mean. The effectiveness of the algorithm has been verified by numerical simulation and field NMR logging data. The research shows that the clustering results based on GMM method have good correlations with the shape and distribution of the T;spectrum, pore structure, and petroleum productivity, providing a new means for quantitative identification of pore structure, reservoir grading, and oil and gas productivity evaluation.展开更多
The complexity of large-scale network systems made of a large number of nonlinearly interconnected components is a restrictive facet for their modeling and analysis. In this paper, we propose a framework of hierarchic...The complexity of large-scale network systems made of a large number of nonlinearly interconnected components is a restrictive facet for their modeling and analysis. In this paper, we propose a framework of hierarchical modeling of a complex network system, based on a recursive unsupervised spectral clustering method. The hierarchical model serves the purpose of facilitating the management of complexity in the analysis of real-world critical infrastructures. We exemplify this by referring to the reliability analysis of the 380 kV Italian Power Transmission Network (IPTN). In this work of analysis, the classical component Importance Measures (IMs) of reliability theory have been extended to render them compatible and applicable to a complex distributed network system. By utilizing these extended IMs, the reliability properties of the IPTN system can be evaluated in the framework of the hierarchical system model, with the aim of providing risk managers with information on the risk/safety significance of system structures and components.展开更多
The present multi-harmonic shell clustering of a nucleus is a direct consequence of the fermionic nature of nucleons and their average sizes. The most probable form and the average size for each proton or neutron shel...The present multi-harmonic shell clustering of a nucleus is a direct consequence of the fermionic nature of nucleons and their average sizes. The most probable form and the average size for each proton or neutron shell are here presented by a specific equilibrium polyhedron of definite size. All such polyhedral shells are closely packed leading to a shell clustering of a nucleus. A harmonic oscillator potential is employed for each shell. All magic and semi-magic numbers, g.s. single particle and total binding energies, proton, neutron and mass radii of 40Ca, 48Ca, 54Fe, 90Zr, 108Sn, 114Te, 142Nd, and 208Pb are very successfully predicted.展开更多
Two mixed linear models are proposed for grouping populations by a dissimilarity coefficent which has two parameters for squared difference of marginal mean and variance component of interaction.Cluster trees can be c...Two mixed linear models are proposed for grouping populations by a dissimilarity coefficent which has two parameters for squared difference of marginal mean and variance component of interaction.Cluster trees can be constructed by the mixed linear model approaches for experimental data with sampling errors within populations or with some missing values.Unweighted pair-group method ( UPGM ) is suggested as fusion method. Sampling variances of estimated dissimilarity coefficient can be obtained by the jackknife procedure.A one-tail t-test is applicable for detecting significance of dissimilarity of populaions within specific group.Unbiasedness and efficiency for estimation of dissimilarity coefficients are proved by Monte Carolo simulations.Worked example from cotton yield data is given for demonstration of the use of these cluster methods.展开更多
City cluster is an effective platform for encouraging regionally coordinated development.Coordinated reduction of carbon emissions within city cluster via the spatial association network between cities can help coordi...City cluster is an effective platform for encouraging regionally coordinated development.Coordinated reduction of carbon emissions within city cluster via the spatial association network between cities can help coordinate the regional carbon emission management,realize sustainable development,and assist China in achieving the carbon peaking and carbon neutrality goals.This paper applies the improved gravity model and social network analysis(SNA)to the study of spatial correlation of carbon emissions in city clusters and analyzes the structural characteristics of the spatial correlation network of carbon emissions in the Yangtze River Delta(YRD)city cluster in China and its influencing factors.The results demonstrate that:1)the spatial association of carbon emissions in the YRD city cluster exhibits a typical and complex multi-threaded network structure.The network association number and density show an upward trend,indicating closer spatial association between cities,but their values remain generally low.Meanwhile,the network hierarchy and network efficiency show a downward trend but remain high.2)The spatial association network of carbon emissions in the YRD city cluster shows an obvious‘core-edge’distribution pattern.The network is centered around Shanghai,Suzhou and Wuxi,all of which play the role of‘bridges’,while cities such as Zhoushan,Ma'anshan,Tongling and other cities characterized by the remote location,single transportation mode or lower economic level are positioned at the edge of the network.3)Geographic proximity,varying levels of economic development,different industrial structures,degrees of urbanization,levels of technological innovation,energy intensities and environmental regulation are important influencing factors on the spatial association of within the YRD city cluster.Finally,policy implications are provided from four aspects:government macro-control and market mechanism guidance,structural characteristics of the‘core-edge’network,reconfiguration and optimization of the spatial layout of the YRD city cluster,and the application of advanced technologies.展开更多
Statistical prediction is often required in reservoir simulation to quantify production uncertainty or assess potential risks.Most existing uncertainty quantification procedures aim to decompose the input random field...Statistical prediction is often required in reservoir simulation to quantify production uncertainty or assess potential risks.Most existing uncertainty quantification procedures aim to decompose the input random field to independent random variables,and may suffer from the curse of dimensionality if the correlation scale is small compared to the domain size.In this work,we develop and test a new approach,K-means clustering assisted empirical modeling,for efficiently estimating waterflooding performance for multiple geological realizations.This method performs single-phase flow simulations in a large number of realizations,and uses K-means clustering to select only a few representatives,on which the two-phase flow simulations are implemented.The empirical models are then adopted to describe the relation between the single-phase solutions and the two-phase solutions using these representatives.Finally,the two-phase solutions in all realizations can be predicted using the empirical models readily.The method is applied to both 2D and 3D synthetic models and is shown to perform well in the P10,P50 and P90 of production rates,as well as the probability distributions as illustrated by cumulative density functions.It is able to capture the ensemble statistics of the Monte Carlo simulation results with a large number of realizations,and the computational cost is significantly reduced.展开更多
Category-based statistic language model is an important method to solve the problem of sparse data.But there are two bottlenecks:1) The problem of word clustering.It is hard to find a suitable clustering method with g...Category-based statistic language model is an important method to solve the problem of sparse data.But there are two bottlenecks:1) The problem of word clustering.It is hard to find a suitable clustering method with good performance and less computation.2) Class-based method always loses the prediction ability to adapt the text in different domains.In order to solve above problems,a definition of word similarity by utilizing mutual information was presented.Based on word similarity,the definition of word set similarity was given.Experiments show that word clustering algorithm based on similarity is better than conventional greedy clustering method in speed and performance,and the perplexity is reduced from 283 to 218.At the same time,an absolute weighted difference method was presented and was used to construct vari-gram language model which has good prediction ability.The perplexity of vari-gram model is reduced from 234.65 to 219.14 on Chinese corpora,and is reduced from 195.56 to 184.25 on English corpora compared with category-based model.展开更多
The pair-wise forces in the SPH momentum equation guarantee the conservation of momentum, but they cannot prevent particle clustering and wall penetration. Particle clustering may occur for several reasons. A fundamen...The pair-wise forces in the SPH momentum equation guarantee the conservation of momentum, but they cannot prevent particle clustering and wall penetration. Particle clustering may occur for several reasons. A fundamental issue is the tensile instability, which is caused by negative numerical pressures. Clustering may also occur due to certain properties of the kernel gradient. Discontinuities in the pressure and its gradient, due to surface tension and gravity, may cause particle instabilities near the interface between two fluids. Wall penetration is also a form of particle clustering. In this paper the particle collision concept is introduced to suppress particle clustering. Here, the use of kinematic conditions (motion) rather than dynamic conditions (forces) is explored. These kinematic conditions are obtained from kinetic collision theory. Conservation of momentum is maintained, and under elastic conditions conservation of energy as well. The particle collision model only becomes active when needed. It may be seen as a particle shifting method, in the sense that the velocities are changed, and as a consequence of that the particle positions change. It is demonstrated in several case studies that the particle collision model allows for realistic (low) viscosities. It was also found to stabilise the interface between two fluids up to high, realistic density ratios (1000:1) in typical liquid-gas applications. As such it can be used as a multi-fluid model. The concept allows for real wave speed ratios (and far beyond), which, as well as real viscosities, are essential in the modelling of heat transfer applications. The collisions with walls allow for no-slip conditions at real viscosities while wall penetration is suppressed. In summary, the particle collision model makes SPH more robust for engineering.展开更多
User model which is the representation of information about user is the heart of adaptive systems. It helps adaptive systems to perform adaptation tasks. There are two kinds of adaptations: 1) Individual adaptation re...User model which is the representation of information about user is the heart of adaptive systems. It helps adaptive systems to perform adaptation tasks. There are two kinds of adaptations: 1) Individual adaptation regarding to each user;2) Group adaptation focusing on group of users. To support group adaptation, the basic problem which needs to be solved is how to create user groups. This relates to clustering techniques so as to cluster user models because a group is considered as a cluster of similar user models. In this paper we discuss two clustering algorithms: k-means and k-medoids and also propose dissimilarity measures and similarity measures which are applied into different structures (forms) of user models like vector, overlay, and Bayesian network.展开更多
基金The US National Science Foundation (No. CMMI-0408390,CMMI-0644552)the American Chemical Society Petroleum Research Foundation (No.PRF-44468-G9)+3 种基金the Research Fellowship for International Young Scientists (No.51050110143)the Fok Ying-Tong Education Foundation (No.114024)the Natural Science Foundation of Jiangsu Province (No.BK2009015)the Postdoctoral Science Foundation of Jiangsu Province (No.0901005C)
文摘Based on Gaussian mixture models(GMM), speed, flow and occupancy are used together in the cluster analysis of traffic flow data. Compared with other clustering and sorting techniques, as a structural model, the GMM is suitable for various kinds of traffic flow parameters. Gap statistics and domain knowledge of traffic flow are used to determine a proper number of clusters. The expectation-maximization (E-M) algorithm is used to estimate parameters of the GMM model. The clustered traffic flow pattems are then analyzed statistically and utilized for designing maximum likelihood classifiers for grouping real-time traffic flow data when new observations become available. Clustering analysis and pattern recognition can also be used to cluster and classify dynamic traffic flow patterns for freeway on-ramp and off-ramp weaving sections as well as for other facilities or things involving the concept of level of service, such as airports, parking lots, intersections, interrupted-flow pedestrian facilities, etc.
基金The National Natural Science Foundation of China (No.70671021)
文摘Due to the fact that the emergency medicine distribution is vital to the quick response to urgent demand when an epidemic occurs, the optimal vaccine distribution approach is explored according to the epidemic diffusion rule and different urgency degrees of affected areas with the background of the epidemic outbreak in a given region. First, the SIQR (susceptible, infected, quarantined,recovered) epidemic model with pulse vaccination is introduced to describe the epidemic diffusion rule and obtain the demanded vaccine in each pulse. Based on the SIQR model, the affected areas are clustered by using the self-organizing map (SOM) neutral network to qualify the results. Then, a dynamic vaccine distribution model is formulated, incorporating the results of clustering the affected areas with the goals of both reducing the transportation cost and decreasing the unsatisfied demand for the emergency logistics network. Numerical study with twenty affected areas and four distribution centers is carried out. The corresponding numerical results indicate that the proposed approach can make an outstanding contribution to controlling the affected areas with a relatively high degree of urgency, and the comparison results prove that the performance of the clustering method is superior to that of the non-clustering method on controlling epidemic diffusion.
基金Supported by National Natural Science Foundation of China(Grant No.11372036)
文摘Reduced order models(ROMs) based on the snapshots on the CFD high-fidelity simulations have been paid great attention recently due to their capability of capturing the features of the complex geometries and flow configurations. To improve the efficiency and precision of the ROMs, it is indispensable to add extra sampling points to the initial snapshots, since the number of sampling points to achieve an adequately accurate ROM is generally unknown in prior, but a large number of initial sampling points reduces the parsimony of the ROMs. A fuzzy-clustering-based adding-point strategy is proposed and the fuzzy clustering acts an indicator of the region in which the precision of ROMs is relatively low. The proposed method is applied to construct the ROMs for the benchmark mathematical examples and a numerical example of hypersonic aerothermodynamics prediction for a typical control surface. The proposed method can achieve a 34.5% improvement on the efficiency than the estimated mean squared error prediction algorithm and shows same-level prediction accuracy.
基金supported by National Science and Technology Major Program of the Ministry of Science and Technology (No.2018ZX03001031)Key program of Beijing Municipal Natural Science Foundation (No. L172030)+2 种基金Beijing Municipal Science & Technology Commission Project (No. Z171100005217001)Key Project of State Key Lab of Networking and Switching Technology (NST20170205)National Key Technology Research and Development Program of the Ministry of Science and Technology of China (NO. 2012BAF14B01)
文摘Cluster-based channel model is the main stream of fifth generation mobile communications, thus the accuracy of clustering algorithm is important. Traditional Gaussian mixture model (GMM) does not consider the power information which is important for the channel multipath clustering. In this paper, a normalized power weighted GMM (PGMM) is introduced to model the channel multipath components (MPCs). With MPC power as a weighted factor, the PGMM can fit the MPCs in accordance with the cluster-based channel models. Firstly, expectation maximization (EM) algorithm is employed to optimize the PGMM parameters. Then, to further increase the searching ability of EM and choose the optimal number of components without resort to cross-validation, the variational Bayesian (VB) inference is employed. Finally, 28 GHz indoor channel measurement data is used to demonstrate the effectiveness of the PGMM clustering algorithm.
基金Supported by the National Natural Science Foundation of China (No.40471089) and the Key Laboratory of Geo-informatics of State Bureau of Surveying and Mapping.
文摘A scheme for an automatic road surface modeling from a noisy point cloud is presented. The normal vectors of the point cloud are estimated by distance-weighted fitting of local plane. Then, an automatic recognition of the road surface from noise is performed based on the fuzzy clustering of normal vectors, with which the mean value is calculated and the projecting plane of point cloud is created to obtain the geometric model accordingly. Based on fuzzy clustering of the intensity attributed to each point, different objects on the road surface are assigned different colors for representing abundant appearances. This unsupervised method is demonstrated in the experiment and shows great effectiveness in reconstructing and rendering better road surface.
基金supported by Key Discipline Construction Program of Beijing Municipal Commission of Education (XK10008043)
文摘Most real application processes belong to a complex nonlinear system with incomplete information. It is difficult to estimate a model by assuming that the data set is governed by a global model. Moreover, in real processes, the available data set is usually obtained with missing values. To overcome the shortcomings of global modeling and missing data values, a new modeling method is proposed. Firstly, an incomplete data set with missing values is partitioned into several clusters by a K-means with soft constraints (KSC) algorithm, which incorporates soft constraints to enable clustering with missing values. Then a local model based on each group is developed by using SVR algorithm, which adopts a missing value insensitive (MVI) kernel to investigate the missing value estimation problem. For each local model, its valid area is gotten as well. Simulation results prove the effectiveness of the current local model and the estimation algorithm.
基金supported by the European Union’s H2020 Coordination and Support Actions CA19130 under Grant Agreement Period 2.
文摘Mean-variance portfolio optimization models are sensitive to uncertainty in risk-return estimates,which may result in poor out-of-sample performance.In particular,the estimates may suffer when the number of assets considered is high and the length of the return time series is not sufficiently long.This is precisely the case in the cryptocur-rency market,where there are hundreds of crypto assets that have been traded for a few years.We propose enhancing the mean-variance(MV)model with a pre-selection stage that uses a prototype-based clustering algorithm to reduce the number of crypto assets considered at each investment period.In the pre-selection stage,we run a prototype-based clustering algorithm where the assets are described by variables representing the profit-risk duality.The prototypes of the clustering partition are auto-matically examined and the one that best suits our risk-aversion preference is selected.We then run the MV portfolio optimization with the crypto assets of the selected cluster.The proposed approach is tested for a period of 17 months in the whole cryp-tocurrency market and two selections of the cryptocurrencies with the higher market capitalization(175 and 250 cryptos).We compare the results against three methods applied to the whole market:classic MV,risk parity,and hierarchical risk parity methods.We also compare our results with those from investing in the market index CCI30.The simulation results generally favor our proposal in terms of profit and risk-profit financial indicators.This result reaffirms the convenience of using machine learning methods to guide financial investments in complex and highly-volatile environments such as the cryptocurrency market.
基金Sponsored by the National Natural Science Foundation of China (Grant No. 51075083)
文摘In order to improve the accuracy and efficiency of 3D model retrieval,the method based on affinity propagation clustering algorithm is proposed. Firstly,projection ray-based method is proposed to improve the feature extraction efficiency of 3D models. Based on the relationship between model and its projection,the intersection in 3D space is transformed into intersection in 2D space,which reduces the number of intersection and improves the efficiency of the extraction algorithm. In feature extraction,multi-layer spheres method is analyzed. The two-layer spheres method makes the feature vector more accurate and improves retrieval precision. Secondly,Semi-supervised Affinity Propagation ( S-AP) clustering is utilized because it can be applied to different cluster structures. The S-AP algorithm is adopted to find the center models and then the center model collection is built. During retrieval process,the collection is utilized to classify the query model into corresponding model base and then the most similar model is retrieved in the model base. Finally,75 sample models from Princeton library are selected to do the experiment and then 36 models are used for retrieval test. The results validate that the proposed method outperforms the original method and the retrieval precision and recall ratios are improved effectively.
文摘To overcome the limitation of the traditional clustering algorithms which fail to produce meaningful clusters in high-dimensional, sparseness and binary value data sets, a new method based on hypergraph model is proposed. The hypergraph model maps the relationship present in the original data in high dimensional space into a hypergraph. A hyperedge represents the similarity of attrlbute-value distribution between two points. A hypergraph partitioning algorithm is used to find a partitioning of the vertices such that the corresponding data items in each partition are highly related and the weight of the hyperedges cut by the partitioning is minimized. The quality of the clustering result can be evaluated by applying the intra-cluster singularity value. Analysis and experimental results have demonstrated that this approach is applicable and effective in wide ranging scheme.
基金Supported by the National Natural Science Foundation of China (42174142)National Science and Technology Major Project (2017ZX05039-002)+2 种基金Operation Fund of China National Petroleum Corporation Logging Key Laboratory (2021DQ20210107-11)Fundamental Research Funds for Central Universities (19CX02006A)Major Science and Technology Project of China National Petroleum Corporation (ZD2019-183-006)。
文摘To make the quantitative results of nuclear magnetic resonance(NMR) transverse relaxation(T;) spectrums reflect the type and pore structure of reservoir more directly, an unsupervised clustering method was developed to obtain the quantitative pore structure information from the NMR T;spectrums based on the Gaussian mixture model(GMM). Firstly, We conducted the principal component analysis on T;spectrums in order to reduce the dimension data and the dependence of the original variables. Secondly, the dimension-reduced data was fitted using the GMM probability density function, and the model parameters and optimal clustering numbers were obtained according to the expectation-maximization algorithm and the change of the Akaike information criterion. Finally, the T;spectrum features and pore structure types of different clustering groups were analyzed and compared with T;geometric mean and T;arithmetic mean. The effectiveness of the algorithm has been verified by numerical simulation and field NMR logging data. The research shows that the clustering results based on GMM method have good correlations with the shape and distribution of the T;spectrum, pore structure, and petroleum productivity, providing a new means for quantitative identification of pore structure, reservoir grading, and oil and gas productivity evaluation.
文摘The complexity of large-scale network systems made of a large number of nonlinearly interconnected components is a restrictive facet for their modeling and analysis. In this paper, we propose a framework of hierarchical modeling of a complex network system, based on a recursive unsupervised spectral clustering method. The hierarchical model serves the purpose of facilitating the management of complexity in the analysis of real-world critical infrastructures. We exemplify this by referring to the reliability analysis of the 380 kV Italian Power Transmission Network (IPTN). In this work of analysis, the classical component Importance Measures (IMs) of reliability theory have been extended to render them compatible and applicable to a complex distributed network system. By utilizing these extended IMs, the reliability properties of the IPTN system can be evaluated in the framework of the hierarchical system model, with the aim of providing risk managers with information on the risk/safety significance of system structures and components.
文摘The present multi-harmonic shell clustering of a nucleus is a direct consequence of the fermionic nature of nucleons and their average sizes. The most probable form and the average size for each proton or neutron shell are here presented by a specific equilibrium polyhedron of definite size. All such polyhedral shells are closely packed leading to a shell clustering of a nucleus. A harmonic oscillator potential is employed for each shell. All magic and semi-magic numbers, g.s. single particle and total binding energies, proton, neutron and mass radii of 40Ca, 48Ca, 54Fe, 90Zr, 108Sn, 114Te, 142Nd, and 208Pb are very successfully predicted.
文摘Two mixed linear models are proposed for grouping populations by a dissimilarity coefficent which has two parameters for squared difference of marginal mean and variance component of interaction.Cluster trees can be constructed by the mixed linear model approaches for experimental data with sampling errors within populations or with some missing values.Unweighted pair-group method ( UPGM ) is suggested as fusion method. Sampling variances of estimated dissimilarity coefficient can be obtained by the jackknife procedure.A one-tail t-test is applicable for detecting significance of dissimilarity of populaions within specific group.Unbiasedness and efficiency for estimation of dissimilarity coefficients are proved by Monte Carolo simulations.Worked example from cotton yield data is given for demonstration of the use of these cluster methods.
基金Under the auspices of the National Natural Science Foundation of China (No.72273151)。
文摘City cluster is an effective platform for encouraging regionally coordinated development.Coordinated reduction of carbon emissions within city cluster via the spatial association network between cities can help coordinate the regional carbon emission management,realize sustainable development,and assist China in achieving the carbon peaking and carbon neutrality goals.This paper applies the improved gravity model and social network analysis(SNA)to the study of spatial correlation of carbon emissions in city clusters and analyzes the structural characteristics of the spatial correlation network of carbon emissions in the Yangtze River Delta(YRD)city cluster in China and its influencing factors.The results demonstrate that:1)the spatial association of carbon emissions in the YRD city cluster exhibits a typical and complex multi-threaded network structure.The network association number and density show an upward trend,indicating closer spatial association between cities,but their values remain generally low.Meanwhile,the network hierarchy and network efficiency show a downward trend but remain high.2)The spatial association network of carbon emissions in the YRD city cluster shows an obvious‘core-edge’distribution pattern.The network is centered around Shanghai,Suzhou and Wuxi,all of which play the role of‘bridges’,while cities such as Zhoushan,Ma'anshan,Tongling and other cities characterized by the remote location,single transportation mode or lower economic level are positioned at the edge of the network.3)Geographic proximity,varying levels of economic development,different industrial structures,degrees of urbanization,levels of technological innovation,energy intensities and environmental regulation are important influencing factors on the spatial association of within the YRD city cluster.Finally,policy implications are provided from four aspects:government macro-control and market mechanism guidance,structural characteristics of the‘core-edge’network,reconfiguration and optimization of the spatial layout of the YRD city cluster,and the application of advanced technologies.
基金the funding supported by Beijing Natural Science Foundation(Grant No.3222037)the PetroChina Innovation Foundation(Grant No.2020D-5007-0203)by the Science Foundation of China University of Petroleum,Beijing(Nos.2462021YXZZ010,2462018QZDX13,and 2462020YXZZ028)
文摘Statistical prediction is often required in reservoir simulation to quantify production uncertainty or assess potential risks.Most existing uncertainty quantification procedures aim to decompose the input random field to independent random variables,and may suffer from the curse of dimensionality if the correlation scale is small compared to the domain size.In this work,we develop and test a new approach,K-means clustering assisted empirical modeling,for efficiently estimating waterflooding performance for multiple geological realizations.This method performs single-phase flow simulations in a large number of realizations,and uses K-means clustering to select only a few representatives,on which the two-phase flow simulations are implemented.The empirical models are then adopted to describe the relation between the single-phase solutions and the two-phase solutions using these representatives.Finally,the two-phase solutions in all realizations can be predicted using the empirical models readily.The method is applied to both 2D and 3D synthetic models and is shown to perform well in the P10,P50 and P90 of production rates,as well as the probability distributions as illustrated by cumulative density functions.It is able to capture the ensemble statistics of the Monte Carlo simulation results with a large number of realizations,and the computational cost is significantly reduced.
基金Project(60763001) supported by the National Natural Science Foundation of ChinaProject(2010GZS0072) supported by the Natural Science Foundation of Jiangxi Province,ChinaProject(GJJ12271) supported by the Science and Technology Foundation of Provincial Education Department of Jiangxi Province,China
文摘Category-based statistic language model is an important method to solve the problem of sparse data.But there are two bottlenecks:1) The problem of word clustering.It is hard to find a suitable clustering method with good performance and less computation.2) Class-based method always loses the prediction ability to adapt the text in different domains.In order to solve above problems,a definition of word similarity by utilizing mutual information was presented.Based on word similarity,the definition of word set similarity was given.Experiments show that word clustering algorithm based on similarity is better than conventional greedy clustering method in speed and performance,and the perplexity is reduced from 283 to 218.At the same time,an absolute weighted difference method was presented and was used to construct vari-gram language model which has good prediction ability.The perplexity of vari-gram model is reduced from 234.65 to 219.14 on Chinese corpora,and is reduced from 195.56 to 184.25 on English corpora compared with category-based model.
文摘The pair-wise forces in the SPH momentum equation guarantee the conservation of momentum, but they cannot prevent particle clustering and wall penetration. Particle clustering may occur for several reasons. A fundamental issue is the tensile instability, which is caused by negative numerical pressures. Clustering may also occur due to certain properties of the kernel gradient. Discontinuities in the pressure and its gradient, due to surface tension and gravity, may cause particle instabilities near the interface between two fluids. Wall penetration is also a form of particle clustering. In this paper the particle collision concept is introduced to suppress particle clustering. Here, the use of kinematic conditions (motion) rather than dynamic conditions (forces) is explored. These kinematic conditions are obtained from kinetic collision theory. Conservation of momentum is maintained, and under elastic conditions conservation of energy as well. The particle collision model only becomes active when needed. It may be seen as a particle shifting method, in the sense that the velocities are changed, and as a consequence of that the particle positions change. It is demonstrated in several case studies that the particle collision model allows for realistic (low) viscosities. It was also found to stabilise the interface between two fluids up to high, realistic density ratios (1000:1) in typical liquid-gas applications. As such it can be used as a multi-fluid model. The concept allows for real wave speed ratios (and far beyond), which, as well as real viscosities, are essential in the modelling of heat transfer applications. The collisions with walls allow for no-slip conditions at real viscosities while wall penetration is suppressed. In summary, the particle collision model makes SPH more robust for engineering.
文摘User model which is the representation of information about user is the heart of adaptive systems. It helps adaptive systems to perform adaptation tasks. There are two kinds of adaptations: 1) Individual adaptation regarding to each user;2) Group adaptation focusing on group of users. To support group adaptation, the basic problem which needs to be solved is how to create user groups. This relates to clustering techniques so as to cluster user models because a group is considered as a cluster of similar user models. In this paper we discuss two clustering algorithms: k-means and k-medoids and also propose dissimilarity measures and similarity measures which are applied into different structures (forms) of user models like vector, overlay, and Bayesian network.