双聚类方法是当前分析基因表达数据的一个重要研究方向,其挖掘目标是发现哪些基因在哪些实验条件下具有相似的表达水平或者关系密切。目前已提出了许多双聚类算法来挖掘不同类型的双聚类,然而其大部分挖掘效率不高。鉴于此,提出了一个...双聚类方法是当前分析基因表达数据的一个重要研究方向,其挖掘目标是发现哪些基因在哪些实验条件下具有相似的表达水平或者关系密切。目前已提出了许多双聚类算法来挖掘不同类型的双聚类,然而其大部分挖掘效率不高。鉴于此,提出了一个新颖的挖掘算法———MRCluster,其主要是用来从原始的基因表达数据中挖掘最大的行常量双聚类模式。就其挖掘效率来说,它采用的是基于Apriori原则的基因扩展深度优先的挖掘策略,并且在挖掘过程中引入了一些新颖的剪枝技术来提高效率。将MRCluster和一个行常量双聚类模式挖掘方法 RAP(range support pattern)算法进行比较,从实验结果上可以看出,相比RAP算法,MRCluster算法对在原始的基因表达数据中挖掘最大的行常量双聚类模式具有更好的效率。因此,MRCluster算法能够有效地从原始的基因表达数据中挖掘最大的行常量双聚类。展开更多
Helicopter mathematical model mainly depends on design helicopter control system, flight simulator, and real time control simulation system. But it is difficult to establish a helicopter flight dynamics mathematical ...Helicopter mathematical model mainly depends on design helicopter control system, flight simulator, and real time control simulation system. But it is difficult to establish a helicopter flight dynamics mathematical model that has features such as rapidness, reliability and precision, because there is no unique and precise expression to some sophisticated phenomenon of helicopter. In this paper a fuzzy helicopter flight model is constructed based on the flight experimental data. The fuzzy model, which is identified by fuzzy inference, has characteristics of computed rapidness and high precision. In order to guarantee the precision of the identified fuzzy model, a new method is adopted to handle the conflict fuzzy rules. Additionally, using fuzzy clustering technology can effectively reduce the number of rules of fuzzy model, namely, the order of the fuzzy model. The simulation results indicate that the method of this paper is effective and feasible.展开更多
Air traffic controllers face challenging initiatives due to uncertainty in air traffic.One way to support their initiatives is to identify similar operation scenes.Based on the operation characteristics of typical bus...Air traffic controllers face challenging initiatives due to uncertainty in air traffic.One way to support their initiatives is to identify similar operation scenes.Based on the operation characteristics of typical busy area control airspace,an complexity measurement indicator system is established.We find that operation in area sector is characterized by aggregation and continuity,and that dimensionality and information redundancy reduction are feasible for dynamic operation data base on principle components.Using principle components,discrete features and time series features are constructed.Based on Gaussian kernel function,Euclidean distance and dynamic time warping(DTW)are used to measure the similarity of the features.Then the matrices of similarity are input in Spectral Clustering.The clustering results show that similar scenes of trend are not ideal and similar scenes of modes are good base on the indicator system.Finally,actual vertical operation decisions for area sector and results of identification are compared,which are visualized by metric multidimensional scaling(MDS)plots.We find that identification results can well reflect the operation at peak hours,but controllers make different decisions under the similar conditions before dawn.The compliance rate of busy operation mode and division decisions at peak hours is 96.7%.The results also show subjectivity of actual operation and objectivity of identification.In most scenes,we observe that similar air traffic activities provide regularity for initiatives,validating the potential of this approach for initiatives and other artificial intelligence support.展开更多
The Circle algorithm was proposed for large datasets.The idea of the algorithm is to find a set of vertices that are close to each other and far from other vertices.This algorithm makes use of the connection between c...The Circle algorithm was proposed for large datasets.The idea of the algorithm is to find a set of vertices that are close to each other and far from other vertices.This algorithm makes use of the connection between clustering aggregation and the problem of correlation clustering.The best deterministic approximation algorithm was provided for the variation of the correlation of clustering problem,and showed how sampling can be used to scale the algorithms for large datasets.An extensive empirical evaluation was given for the usefulness of the problem and the solutions.The results show that this method achieves more than 50% reduction in the running time without sacrificing the quality of the clustering.展开更多
The food industry is evolving more towards new forms of organization much more complex and characterized by a greater degree of coordination, whether in the form of vertical integration of explicit or implicit contrac...The food industry is evolving more towards new forms of organization much more complex and characterized by a greater degree of coordination, whether in the form of vertical integration of explicit or implicit contracts between players of different levels of the industry. Therefore, the aim of this work is the search for mechanisms that can provide value to the production phase to better increase competitiveness of the sector. For the first time, in fact, discussion about food chains have as reference a recognized legal entity, which is the integrated projects of food chain as a result of actions of agricultural policy at community, national and regional levels. The methodology is related to two steps: the administration of questionnaires to the three companies participating in food chain partnerships that have proposed a draft of integrated design of food chain in response to the notice of the Apulia region for the submission of the integrated projects of the food chain; and a cluster analysis in the wine sector of the Italian regions. The results showed, thanks to Network Analysis, the importance for the chain development of relationships formed by market relations and cooperation relations (formal and informal) and the need for more actions for the enhancement of products by research and development activities.展开更多
The paper study improved K-means algorithm and establish indicators to classify customers according to RFM model. Experimental results show that, the new algorithm has good convergence and stability, it has better tha...The paper study improved K-means algorithm and establish indicators to classify customers according to RFM model. Experimental results show that, the new algorithm has good convergence and stability, it has better than single use of FKP algorithms for clustering. Finally the paper study the application of clustering in customer segmentation of mobile communication enterprise. It discusses the basic theory, customer segmentation methods and steps, the customer segmentation model based on consumption behavior psychology, and the segmentation model is successfully applied to the process of marketing decision support.展开更多
We presented a novel framework for automatic behavior clustering and unsupervised anomaly detection in a large video set. The framework consisted of the following key components: 1 ) Drawing from natural language pr...We presented a novel framework for automatic behavior clustering and unsupervised anomaly detection in a large video set. The framework consisted of the following key components: 1 ) Drawing from natural language processing, we introduced a compact and effective behavior representation method as a stochastic sequence of spatiotemporal events, where we analyzed the global structural information of behaviors using their local action statistics. 2) The natural grouping of behavior patterns was discovered through a novel clustering algorithm. 3 ) A run-time accumulative anomaly measure was introduced to detect abnormal behavior, whereas normal behavior patterns were recognized when sufficient visual evidence had become available based on an online Likelihood Ratio Test (LRT) method. This ensured robust and reliable anomaly detection and normal behavior recognition at the shortest possible time. Experimental results demonstrated the effectiveness and robustness of our approach using noisy and sparse data sets collected from a real surveillance scenario.展开更多
Owing to the potential for intercell cochannel interference mitigation and significant spectral efficiency improvement, coordinating transmission techniques by multiple radio access points have recently attracted a lo...Owing to the potential for intercell cochannel interference mitigation and significant spectral efficiency improvement, coordinating transmission techniques by multiple radio access points have recently attracted a lot of attention. In this paper, the system structure and mathematical signal model based on clustered structure are presented for multipoint coordinating downlink transmission, the clustered supercell configurations with static/dynamic approaches are discussed, and then optimal precod- ing design is provided for an accepted level of scheduling complexity and reduced signaling over- head. Some simulation results are given to evaluate the performance of different cell-clustering approaches, and to show that a clustered supercell size of 7 is a reasonable choice for clustered coordination with the given transmit power and the reduced feedback.展开更多
This paper provides an overview of the main recommendations and approaches of the methodology on parallel computation application development for hybrid structures. This methodology was developed within the master's ...This paper provides an overview of the main recommendations and approaches of the methodology on parallel computation application development for hybrid structures. This methodology was developed within the master's thesis project "Optimization of complex tasks' computation on hybrid distributed computational structures" accomplished by Orekhov during which the main research objective was the determination of" patterns of the behavior of scaling efficiency and other parameters which define performance of different algorithms' implementations executed on hybrid distributed computational structures. Major outcomes and dependencies obtained within the master's thesis project were formed into a methodology which covers the problems of applications based on parallel computations and describes the process of its development in details, offering easy ways of avoiding potentially crucial problems. The paper is backed by the real-life examples such as clustering algorithms instead of artificial benchmarks.展开更多
In order to solve the bottleneck problem of the traditional K-Medoids clustering algorithm facing to deal with massive data information at the time of memory capacity and processing speed of CPU, the paper proposed a ...In order to solve the bottleneck problem of the traditional K-Medoids clustering algorithm facing to deal with massive data information at the time of memory capacity and processing speed of CPU, the paper proposed a parallel algorithm MapReduce programming model based on the research of K-Medoids algorithm. This algorithm increase the computation granularity and reduces the communication cost ratio based on the MapReduce model. The experimental results show that the improved parallel algorithm compared with other algorithms, speedup and operation efficiency is greatly enhanced.展开更多
The density peak (DP) algorithm has been widely used in scientific research due to its novel and effective peak density-based clustering approach. However, the DP algorithm uses each pair of data points several time...The density peak (DP) algorithm has been widely used in scientific research due to its novel and effective peak density-based clustering approach. However, the DP algorithm uses each pair of data points several times when determining cluster centers, yielding high computational complexity. In this paper, we focus on accelerating the time-consuming density peaks algorithm with a graphics processing unit (GPU). We analyze the principle of the algorithm to locate its computational bottlenecks, and evaluate its potential for parallelism. In light of our analysis, we propose an efficient parallel DP algorithm targeting on a GPU architecture and implement this parallel method with compute unified device architecture (CUDA), called the ‘CUDA-DP platform'. Specifically, we use shared memory to improve data locality, which reduces the amount of global memory access. To exploit the coalescing accessing mechanism of CPU, we convert the data structure of the CUDA-DP program from array of structures to structure of arrays. In addition, we introduce a binary search-and-sampling method to avoid sorting a large array. The results of the experiment show that CUDA-DP can achieve a 45-fold acceleration when compared to the central processing unit based density peaks implementation.展开更多
The furnace process is very important in boiler operation,and furnace pressure works as an important parameter in furnace process.Therefore,there is a need to analyze and monitor the pressure signal in furnace.However...The furnace process is very important in boiler operation,and furnace pressure works as an important parameter in furnace process.Therefore,there is a need to analyze and monitor the pressure signal in furnace.However,little work has been conducted on the relationship with the pressure sequence and boiler’s load under different working conditions.Since pressure sequence contains complex information,it demands feature extraction methods from multi-aspect consideration.In this paper,fuzzy c-means analysis method based on weighted validity index(VFCM)has been proposed for the working condition classification based on feature extraction.To deal with the fluctuating and time-varying pressure sequence,feature extraction is taken as nonlinear analysis based on entropy theory.Three kinds of entropy values,extracted from pressure sequence in time-frequency domain,are studied as the clustering objects for work condition classification.Weighted validity index,taking the close and separation degree into consideration,is calculated on the base of Silhouette index and Krzanowski-Lai index to obtain the optimal clustering number.Each time FCM runs,the weighted validity index evaluates the clustering result and the optimal clustering number will be obtained when it reaches the maximum value.Four datasets from UCI Machine Learning Repository are presented to certify the effectiveness in VFCM.Pressure sequences got from a 300 MW boiler are then taken for case study.The result of the pressure sequence case study with an error rate of 0.5332%shows the valuable information on boiler’s load and pressure sequence in furnace.The relationship between boiler’s load and entropy values extracted from pressure sequence is proposed.Moreover,the method can be considered to be a reference method for data mining in other fluctuating and time-varying sequences.展开更多
文摘双聚类方法是当前分析基因表达数据的一个重要研究方向,其挖掘目标是发现哪些基因在哪些实验条件下具有相似的表达水平或者关系密切。目前已提出了许多双聚类算法来挖掘不同类型的双聚类,然而其大部分挖掘效率不高。鉴于此,提出了一个新颖的挖掘算法———MRCluster,其主要是用来从原始的基因表达数据中挖掘最大的行常量双聚类模式。就其挖掘效率来说,它采用的是基于Apriori原则的基因扩展深度优先的挖掘策略,并且在挖掘过程中引入了一些新颖的剪枝技术来提高效率。将MRCluster和一个行常量双聚类模式挖掘方法 RAP(range support pattern)算法进行比较,从实验结果上可以看出,相比RAP算法,MRCluster算法对在原始的基因表达数据中挖掘最大的行常量双聚类模式具有更好的效率。因此,MRCluster算法能够有效地从原始的基因表达数据中挖掘最大的行常量双聚类。
文摘Helicopter mathematical model mainly depends on design helicopter control system, flight simulator, and real time control simulation system. But it is difficult to establish a helicopter flight dynamics mathematical model that has features such as rapidness, reliability and precision, because there is no unique and precise expression to some sophisticated phenomenon of helicopter. In this paper a fuzzy helicopter flight model is constructed based on the flight experimental data. The fuzzy model, which is identified by fuzzy inference, has characteristics of computed rapidness and high precision. In order to guarantee the precision of the identified fuzzy model, a new method is adopted to handle the conflict fuzzy rules. Additionally, using fuzzy clustering technology can effectively reduce the number of rules of fuzzy model, namely, the order of the fuzzy model. The simulation results indicate that the method of this paper is effective and feasible.
基金the National Natural Science Foundation of China(Nos.71731001,61573181,71971114)the Fundamental Research Funds for the Central Universities(No.NS2020045)。
文摘Air traffic controllers face challenging initiatives due to uncertainty in air traffic.One way to support their initiatives is to identify similar operation scenes.Based on the operation characteristics of typical busy area control airspace,an complexity measurement indicator system is established.We find that operation in area sector is characterized by aggregation and continuity,and that dimensionality and information redundancy reduction are feasible for dynamic operation data base on principle components.Using principle components,discrete features and time series features are constructed.Based on Gaussian kernel function,Euclidean distance and dynamic time warping(DTW)are used to measure the similarity of the features.Then the matrices of similarity are input in Spectral Clustering.The clustering results show that similar scenes of trend are not ideal and similar scenes of modes are good base on the indicator system.Finally,actual vertical operation decisions for area sector and results of identification are compared,which are visualized by metric multidimensional scaling(MDS)plots.We find that identification results can well reflect the operation at peak hours,but controllers make different decisions under the similar conditions before dawn.The compliance rate of busy operation mode and division decisions at peak hours is 96.7%.The results also show subjectivity of actual operation and objectivity of identification.In most scenes,we observe that similar air traffic activities provide regularity for initiatives,validating the potential of this approach for initiatives and other artificial intelligence support.
基金Projects(60873265,60903222) supported by the National Natural Science Foundation of China Project(IRT0661) supported by the Program for Changjiang Scholars and Innovative Research Team in University of China
文摘The Circle algorithm was proposed for large datasets.The idea of the algorithm is to find a set of vertices that are close to each other and far from other vertices.This algorithm makes use of the connection between clustering aggregation and the problem of correlation clustering.The best deterministic approximation algorithm was provided for the variation of the correlation of clustering problem,and showed how sampling can be used to scale the algorithms for large datasets.An extensive empirical evaluation was given for the usefulness of the problem and the solutions.The results show that this method achieves more than 50% reduction in the running time without sacrificing the quality of the clustering.
文摘The food industry is evolving more towards new forms of organization much more complex and characterized by a greater degree of coordination, whether in the form of vertical integration of explicit or implicit contracts between players of different levels of the industry. Therefore, the aim of this work is the search for mechanisms that can provide value to the production phase to better increase competitiveness of the sector. For the first time, in fact, discussion about food chains have as reference a recognized legal entity, which is the integrated projects of food chain as a result of actions of agricultural policy at community, national and regional levels. The methodology is related to two steps: the administration of questionnaires to the three companies participating in food chain partnerships that have proposed a draft of integrated design of food chain in response to the notice of the Apulia region for the submission of the integrated projects of the food chain; and a cluster analysis in the wine sector of the Italian regions. The results showed, thanks to Network Analysis, the importance for the chain development of relationships formed by market relations and cooperation relations (formal and informal) and the need for more actions for the enhancement of products by research and development activities.
文摘The paper study improved K-means algorithm and establish indicators to classify customers according to RFM model. Experimental results show that, the new algorithm has good convergence and stability, it has better than single use of FKP algorithms for clustering. Finally the paper study the application of clustering in customer segmentation of mobile communication enterprise. It discusses the basic theory, customer segmentation methods and steps, the customer segmentation model based on consumption behavior psychology, and the segmentation model is successfully applied to the process of marketing decision support.
基金This work is supported by National Natural Science Foundation of China (NSFC) under Grant No. 60573139 andNational Science & Technology Pillar Program of China under Grant NO. 2008BAH221303.
文摘We presented a novel framework for automatic behavior clustering and unsupervised anomaly detection in a large video set. The framework consisted of the following key components: 1 ) Drawing from natural language processing, we introduced a compact and effective behavior representation method as a stochastic sequence of spatiotemporal events, where we analyzed the global structural information of behaviors using their local action statistics. 2) The natural grouping of behavior patterns was discovered through a novel clustering algorithm. 3 ) A run-time accumulative anomaly measure was introduced to detect abnormal behavior, whereas normal behavior patterns were recognized when sufficient visual evidence had become available based on an online Likelihood Ratio Test (LRT) method. This ensured robust and reliable anomaly detection and normal behavior recognition at the shortest possible time. Experimental results demonstrated the effectiveness and robustness of our approach using noisy and sparse data sets collected from a real surveillance scenario.
文摘Owing to the potential for intercell cochannel interference mitigation and significant spectral efficiency improvement, coordinating transmission techniques by multiple radio access points have recently attracted a lot of attention. In this paper, the system structure and mathematical signal model based on clustered structure are presented for multipoint coordinating downlink transmission, the clustered supercell configurations with static/dynamic approaches are discussed, and then optimal precod- ing design is provided for an accepted level of scheduling complexity and reduced signaling over- head. Some simulation results are given to evaluate the performance of different cell-clustering approaches, and to show that a clustered supercell size of 7 is a reasonable choice for clustered coordination with the given transmit power and the reduced feedback.
文摘This paper provides an overview of the main recommendations and approaches of the methodology on parallel computation application development for hybrid structures. This methodology was developed within the master's thesis project "Optimization of complex tasks' computation on hybrid distributed computational structures" accomplished by Orekhov during which the main research objective was the determination of" patterns of the behavior of scaling efficiency and other parameters which define performance of different algorithms' implementations executed on hybrid distributed computational structures. Major outcomes and dependencies obtained within the master's thesis project were formed into a methodology which covers the problems of applications based on parallel computations and describes the process of its development in details, offering easy ways of avoiding potentially crucial problems. The paper is backed by the real-life examples such as clustering algorithms instead of artificial benchmarks.
文摘In order to solve the bottleneck problem of the traditional K-Medoids clustering algorithm facing to deal with massive data information at the time of memory capacity and processing speed of CPU, the paper proposed a parallel algorithm MapReduce programming model based on the research of K-Medoids algorithm. This algorithm increase the computation granularity and reduces the communication cost ratio based on the MapReduce model. The experimental results show that the improved parallel algorithm compared with other algorithms, speedup and operation efficiency is greatly enhanced.
基金supported by the National Basic Research Program(973)of China(No.2014CB340303)the National Natural Science Foundation of China(Nos.61502509 and 61222205)+1 种基金the Program for New Century Excellent Talents in Universitythe Fok Ying-Tong Education Foundation(No.141066)
文摘The density peak (DP) algorithm has been widely used in scientific research due to its novel and effective peak density-based clustering approach. However, the DP algorithm uses each pair of data points several times when determining cluster centers, yielding high computational complexity. In this paper, we focus on accelerating the time-consuming density peaks algorithm with a graphics processing unit (GPU). We analyze the principle of the algorithm to locate its computational bottlenecks, and evaluate its potential for parallelism. In light of our analysis, we propose an efficient parallel DP algorithm targeting on a GPU architecture and implement this parallel method with compute unified device architecture (CUDA), called the ‘CUDA-DP platform'. Specifically, we use shared memory to improve data locality, which reduces the amount of global memory access. To exploit the coalescing accessing mechanism of CPU, we convert the data structure of the CUDA-DP program from array of structures to structure of arrays. In addition, we introduce a binary search-and-sampling method to avoid sorting a large array. The results of the experiment show that CUDA-DP can achieve a 45-fold acceleration when compared to the central processing unit based density peaks implementation.
基金supported by the National Natural Science Foundation of China(Grant No.51176030)Jiangsu Science and Technology Department(Grant No.BY2015070-17)
文摘The furnace process is very important in boiler operation,and furnace pressure works as an important parameter in furnace process.Therefore,there is a need to analyze and monitor the pressure signal in furnace.However,little work has been conducted on the relationship with the pressure sequence and boiler’s load under different working conditions.Since pressure sequence contains complex information,it demands feature extraction methods from multi-aspect consideration.In this paper,fuzzy c-means analysis method based on weighted validity index(VFCM)has been proposed for the working condition classification based on feature extraction.To deal with the fluctuating and time-varying pressure sequence,feature extraction is taken as nonlinear analysis based on entropy theory.Three kinds of entropy values,extracted from pressure sequence in time-frequency domain,are studied as the clustering objects for work condition classification.Weighted validity index,taking the close and separation degree into consideration,is calculated on the base of Silhouette index and Krzanowski-Lai index to obtain the optimal clustering number.Each time FCM runs,the weighted validity index evaluates the clustering result and the optimal clustering number will be obtained when it reaches the maximum value.Four datasets from UCI Machine Learning Repository are presented to certify the effectiveness in VFCM.Pressure sequences got from a 300 MW boiler are then taken for case study.The result of the pressure sequence case study with an error rate of 0.5332%shows the valuable information on boiler’s load and pressure sequence in furnace.The relationship between boiler’s load and entropy values extracted from pressure sequence is proposed.Moreover,the method can be considered to be a reference method for data mining in other fluctuating and time-varying sequences.