期刊文献+
共找到7,772篇文章
< 1 2 250 >
每页显示 20 50 100
Shrinkage Estimation of Semiparametric Model with Missing Responses for Cluster Data
1
作者 Mingxing Zhang Jiannan Qiao +1 位作者 Huawei Yang Zixin Liu 《Open Journal of Statistics》 2015年第7期768-776,共9页
This paper simultaneously investigates variable selection and imputation estimation of semiparametric partially linear varying-coefficient model in that case where there exist missing responses for cluster data. As is... This paper simultaneously investigates variable selection and imputation estimation of semiparametric partially linear varying-coefficient model in that case where there exist missing responses for cluster data. As is well known, commonly used approach to deal with missing data is complete-case data. Combined the idea of complete-case data with a discussion of shrinkage estimation is made on different cluster. In order to avoid the biased results as well as improve the estimation efficiency, this article introduces Group Least Absolute Shrinkage and Selection Operator (Group Lasso) to semiparametric model. That is to say, the method combines the approach of local polynomial smoothing and the Least Absolute Shrinkage and Selection Operator. In that case, it can conduct nonparametric estimation and variable selection in a computationally efficient manner. According to the same criterion, the parametric estimators are also obtained. Additionally, for each cluster, the nonparametric and parametric estimators are derived, and then compute the weighted average per cluster as finally estimators. Moreover, the large sample properties of estimators are also derived respectively. 展开更多
关键词 SEMIPARAMETRIC PARTIALLY Linear Varying-Coefficient Model MISSING RESPONSES cluster data Group Lasso
下载PDF
Power Incomplete Data Clustering Based on Fuzzy Fusion Algorithm
2
作者 Yutian Hong Yuping Yan 《Energy Engineering》 EI 2023年第1期245-261,共17页
With the rapid development of the economy,the scale of the power grid is expanding.The number of power equipment that constitutes the power grid has been very large,which makes the state data of power equipment grow e... With the rapid development of the economy,the scale of the power grid is expanding.The number of power equipment that constitutes the power grid has been very large,which makes the state data of power equipment grow explosively.These multi-source heterogeneous data have data differences,which lead to data variation in the process of transmission and preservation,thus forming the bad information of incomplete data.Therefore,the research on data integrity has become an urgent task.This paper is based on the characteristics of random chance and the Spatio-temporal difference of the system.According to the characteristics and data sources of the massive data generated by power equipment,the fuzzy mining model of power equipment data is established,and the data is divided into numerical and non-numerical data based on numerical data.Take the text data of power equipment defects as the mining material.Then,the Apriori algorithm based on an array is used to mine deeply.The strong association rules in incomplete data of power equipment are obtained and analyzed.From the change trend of NRMSE metrics and classification accuracy,most of the filling methods combined with the two frameworks in this method usually show a relatively stable filling trend,and will not fluctuate greatly with the growth of the missing rate.The experimental results show that the proposed algorithm model can effectively improve the filling effect of the existing filling methods on most data sets,and the filling effect fluctuates greatly with the increase of the missing rate,that is,with the increase of the missing rate,the improvement effect of the model for the existing filling methods is higher than 4.3%.Through the incomplete data clustering technology studied in this paper,a more innovative state assessment of smart grid reliability operation is carried out,which has good research value and reference significance. 展开更多
关键词 Power system equipment parameter incomplete data fuzzy analysis data clustering
下载PDF
Unsupervised Functional Data Clustering Based on Adaptive Weights
3
作者 Yutong Gao Shuang Chen 《Open Journal of Statistics》 2023年第2期212-221,共10页
In recent years, functional data has been widely used in finance, medicine, biology and other fields. The current clustering analysis can solve the problems in finite-dimensional space, but it is difficult to be direc... In recent years, functional data has been widely used in finance, medicine, biology and other fields. The current clustering analysis can solve the problems in finite-dimensional space, but it is difficult to be directly used for the clustering of functional data. In this paper, we propose a new unsupervised clustering algorithm based on adaptive weights. In the absence of initialization parameter, we use entropy-type penalty terms and fuzzy partition matrix to find the optimal number of clusters. At the same time, we introduce a measure based on adaptive weights to reflect the difference in information content between different clustering metrics. Simulation experiments show that the proposed algorithm has higher purity than some algorithms. 展开更多
关键词 Functional data Unsupervised Learning clustering Functional Principal Component Analysis Adaptive Weight
下载PDF
Joint Design of Clustering and In-cluster Data Route for Heterogeneous Wireless Sensor Networks 被引量:1
4
作者 Liang Xue Ying Liu +2 位作者 Zhi-Qun Gu Zhi-Hua Li Xin-Ping Guan 《International Journal of Automation and computing》 EI CSCD 2017年第6期637-649,共13页
A heterogeneous wireless sensor network comprises a number of inexpensive energy constrained wireless sensor nodes which collect data from the sensing environment and transmit them toward the improved cluster head in ... A heterogeneous wireless sensor network comprises a number of inexpensive energy constrained wireless sensor nodes which collect data from the sensing environment and transmit them toward the improved cluster head in a coordinated way. Employing clustering techniques in such networks can achieve balanced energy consumption of member nodes and prolong the network lifetimes.In classical clustering techniques, clustering and in-cluster data routes are usually separated into independent operations. Although separate considerations of these two issues simplify the system design, it is often the non-optimal lifetime expectancy for wireless sensor networks. This paper proposes an integral framework that integrates these two correlated items in an interactive entirety. For that,we develop the clustering problems using nonlinear programming. Evolution process of clustering is provided in simulations. Results show that our joint-design proposal reaches the near optimal match between member nodes and cluster heads. 展开更多
关键词 Heterogeneous wireless sensor networks clustering technique in-cluster data routes integral framework network lifetimes
原文传递
CABOSFV algorithm for high dimensional sparse data clustering 被引量:7
5
作者 Sen Wu Xuedong Gao Management School, University of Science and Technology Beijing, Beijing 100083, China 《Journal of University of Science and Technology Beijing》 CSCD 2004年第3期283-288,共6页
An algorithm, Clustering Algorithm Based On Sparse Feature Vector (CABOSFV),was proposed for the high dimensional clustering of binary sparse data. This algorithm compressesthe data effectively by using a tool 'Sp... An algorithm, Clustering Algorithm Based On Sparse Feature Vector (CABOSFV),was proposed for the high dimensional clustering of binary sparse data. This algorithm compressesthe data effectively by using a tool 'Sparse Feature Vector', thus reduces the data scaleenormously, and can get the clustering result with only one data scan. Both theoretical analysis andempirical tests showed that CABOSFV is of low computational complexity. The algorithm findsclusters in high dimensional large datasets efficiently and handles noise effectively. 展开更多
关键词 clusterING data mining SPARSE high dimensionality
下载PDF
Clustering Structure Analysis in Time-Series Data With Density-Based Clusterability Measure 被引量:6
6
作者 Juho Jokinen Tomi Raty Timo Lintonen 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2019年第6期1332-1343,共12页
Clustering is used to gain an intuition of the struc tures in the data.Most of the current clustering algorithms pro duce a clustering structure even on data that do not possess such structure.In these cases,the algor... Clustering is used to gain an intuition of the struc tures in the data.Most of the current clustering algorithms pro duce a clustering structure even on data that do not possess such structure.In these cases,the algorithms force a structure in the data instead of discovering one.To avoid false structures in the relations of data,a novel clusterability assessment method called density-based clusterability measure is proposed in this paper.I measures the prominence of clustering structure in the data to evaluate whether a cluster analysis could produce a meaningfu insight to the relationships in the data.This is especially useful in time-series data since visualizing the structure in time-series data is hard.The performance of the clusterability measure is evalu ated against several synthetic data sets and time-series data sets which illustrate that the density-based clusterability measure can successfully indicate clustering structure of time-series data. 展开更多
关键词 clusterING EXPLORATORY data analysis time-series UNSUPERVISED LEARNING
下载PDF
A new clustering algorithm for large datasets 被引量:1
7
作者 李清峰 彭文峰 《Journal of Central South University》 SCIE EI CAS 2011年第3期823-829,共7页
The Circle algorithm was proposed for large datasets.The idea of the algorithm is to find a set of vertices that are close to each other and far from other vertices.This algorithm makes use of the connection between c... The Circle algorithm was proposed for large datasets.The idea of the algorithm is to find a set of vertices that are close to each other and far from other vertices.This algorithm makes use of the connection between clustering aggregation and the problem of correlation clustering.The best deterministic approximation algorithm was provided for the variation of the correlation of clustering problem,and showed how sampling can be used to scale the algorithms for large datasets.An extensive empirical evaluation was given for the usefulness of the problem and the solutions.The results show that this method achieves more than 50% reduction in the running time without sacrificing the quality of the clustering. 展开更多
关键词 聚类算法 数据集 近似算法 聚类问题 运行时间 顶点 集群
下载PDF
A Direct Data-Cluster Analysis Method Based on Neutrosophic Set Implication 被引量:1
8
作者 Sudan Jha Gyanendra Prasad Joshi +2 位作者 Lewis Nkenyereya Dae Wan Kim Florentin Smarandache 《Computers, Materials & Continua》 SCIE EI 2020年第11期1203-1220,共18页
Raw data are classified using clustering techniques in a reasonable manner to create disjoint clusters.A lot of clustering algorithms based on specific parameters have been proposed to access a high volume of datasets... Raw data are classified using clustering techniques in a reasonable manner to create disjoint clusters.A lot of clustering algorithms based on specific parameters have been proposed to access a high volume of datasets.This paper focuses on cluster analysis based on neutrosophic set implication,i.e.,a k-means algorithm with a threshold-based clustering technique.This algorithm addresses the shortcomings of the k-means clustering algorithm by overcoming the limitations of the threshold-based clustering algorithm.To evaluate the validity of the proposed method,several validity measures and validity indices are applied to the Iris dataset(from the University of California,Irvine,Machine Learning Repository)along with k-means and threshold-based clustering algorithms.The proposed method results in more segregated datasets with compacted clusters,thus achieving higher validity indices.The method also eliminates the limitations of threshold-based clustering algorithm and validates measures and respective indices along with k-means and threshold-based clustering algorithms. 展开更多
关键词 data clustering data mining neutrosophic set K-MEANS validity measures cluster-based classification hierarchical clustering
下载PDF
Energy-balanced clustering protocol for data gathering in wireless sensor networks with unbalanced traffic load 被引量:1
9
作者 奎晓燕 王建新 张士庚 《Journal of Central South University》 SCIE EI CAS 2012年第11期3180-3187,共8页
Energy-efficient data gathering in multi-hop wireless sensor networks was studied,considering that different node produces different amounts of data in realistic environments.A novel dominating set based clustering pr... Energy-efficient data gathering in multi-hop wireless sensor networks was studied,considering that different node produces different amounts of data in realistic environments.A novel dominating set based clustering protocol (DSCP) was proposed to solve the data gathering problem in this scenario.In DSCP,a node evaluates the potential lifetime of the network (from its local point of view) assuming that it acts as the cluster head,and claims to be a tentative cluster head if it maximizes the potential lifetime.When evaluating the potential lifetime of the network,a node considers not only its remaining energy,but also other factors including its traffic load,the number of its neighbors,and the traffic loads of its neighbors.A tentative cluster head becomes a final cluster head with a probability inversely proportional to the number of tentative cluster heads that cover its neighbors.The protocol can terminate in O(n/lg n) steps,and its total message complexity is O(n2/lg n).Simulation results show that DSCP can effectively prolong the lifetime of the network in multi-hop networks with unbalanced traffic load.Compared with EECT,the network lifetime is prolonged by 56.6% in average. 展开更多
关键词 无线传感器网络 负载不平衡 数据收集 数据流量 协议 能量均衡 延长使用寿命 群集
下载PDF
Fuzzy Clustering Validity for Spatia Data 被引量:1
10
作者 HU Chunchun MENG Lingkui SHI Wenzhong 《Geo-Spatial Information Science》 2008年第3期191-196,共6页
The validity measurement of fuzzy clustering is a key problem. If clustering is formed, it needs a kind of machine to verify its validity. To make mining more accountable, comprehensible and with a usable spatial patt... The validity measurement of fuzzy clustering is a key problem. If clustering is formed, it needs a kind of machine to verify its validity. To make mining more accountable, comprehensible and with a usable spatial pattern, it is necessary to first detect whether the data set has a clustered structure or not before clustering. This paper discusses a detection method for clustered patterns and a fuzzy clustering algorithm, and studies the validity function of the result produced by fuzzy clustering based on two aspects, which reflect the un-certainty of classification during fuzzy partition and spatial location features of spatial data, and proposes a new validity function of fuzzy clustering for spatial data. The experimental result indicates that the new validity function can accurately measure the validity of the results of fuzzy clustering. Especially, for the result of fuzzy clustering of spatial data, it is robust and its classification result is better when compared to other indices. 展开更多
关键词 空间数据 模糊聚类 有效性 地球科学
下载PDF
Scaling up the DBSCAN Algorithm for Clustering Large Spatial Databases Based on Sampling Technique 被引量:9
11
作者 Guan Ji hong 1, Zhou Shui geng 2, Bian Fu ling 3, He Yan xiang 1 1. School of Computer, Wuhan University, Wuhan 430072, China 2.State Key Laboratory of Software Engineering, Wuhan University, Wuhan 430072, China 3.College of Remote Sensin 《Wuhan University Journal of Natural Sciences》 CAS 2001年第Z1期467-473,共7页
Clustering, in data mining, is a useful technique for discovering interesting data distributions and patterns in the underlying data, and has many application fields, such as statistical data analysis, pattern recogni... Clustering, in data mining, is a useful technique for discovering interesting data distributions and patterns in the underlying data, and has many application fields, such as statistical data analysis, pattern recognition, image processing, and etc. We combine sampling technique with DBSCAN algorithm to cluster large spatial databases, and two sampling based DBSCAN (SDBSCAN) algorithms are developed. One algorithm introduces sampling technique inside DBSCAN, and the other uses sampling procedure outside DBSCAN. Experimental results demonstrate that our algorithms are effective and efficient in clustering large scale spatial databases. 展开更多
关键词 spatial databases data mining clusterING sampling DBSCAN algorithm
下载PDF
Linear manifold clustering for high dimensional data based on line manifold searching and fusing 被引量:1
12
作者 黎刚果 王正志 +2 位作者 王晓敏 倪青山 强波 《Journal of Central South University》 SCIE EI CAS 2010年第5期1058-1069,共12页
High dimensional data clustering,with the inherent sparsity of data and the existence of noise,is a serious challenge for clustering algorithms.A new linear manifold clustering method was proposed to address this prob... High dimensional data clustering,with the inherent sparsity of data and the existence of noise,is a serious challenge for clustering algorithms.A new linear manifold clustering method was proposed to address this problem.The basic idea was to search the line manifold clusters hidden in datasets,and then fuse some of the line manifold clusters to construct higher dimensional manifold clusters.The orthogonal distance and the tangent distance were considered together as the linear manifold distance metrics. Spatial neighbor information was fully utilized to construct the original line manifold and optimize line manifolds during the line manifold cluster searching procedure.The results obtained from experiments over real and synthetic data sets demonstrate the superiority of the proposed method over some competing clustering methods in terms of accuracy and computation time.The proposed method is able to obtain high clustering accuracy for various data sets with different sizes,manifold dimensions and noise ratios,which confirms the anti-noise capability and high clustering accuracy of the proposed method for high dimensional data. 展开更多
关键词 线性流形 高维数据 数据聚类 线搜索 数据集中 聚类算法 抗噪声能力 固有噪声
下载PDF
Adaptive Density-Based Spatial Clustering of Applications with Noise(ADBSCAN)for Clusters of Different Densities 被引量:2
13
作者 Ahmed Fahim 《Computers, Materials & Continua》 SCIE EI 2023年第5期3695-3712,共18页
Finding clusters based on density represents a significant class of clustering algorithms.These methods can discover clusters of various shapes and sizes.The most studied algorithm in this class is theDensity-Based Sp... Finding clusters based on density represents a significant class of clustering algorithms.These methods can discover clusters of various shapes and sizes.The most studied algorithm in this class is theDensity-Based Spatial Clustering of Applications with Noise(DBSCAN).It identifies clusters by grouping the densely connected objects into one group and discarding the noise objects.It requires two input parameters:epsilon(fixed neighborhood radius)and MinPts(the lowest number of objects in epsilon).However,it can’t handle clusters of various densities since it uses a global value for epsilon.This article proposes an adaptation of the DBSCAN method so it can discover clusters of varied densities besides reducing the required number of input parameters to only one.Only user input in the proposed method is the MinPts.Epsilon on the other hand,is computed automatically based on statistical information of the dataset.The proposed method finds the core distance for each object in the dataset,takes the average of these distances as the first value of epsilon,and finds the clusters satisfying this density level.The remaining unclustered objects will be clustered using a new value of epsilon that equals the average core distances of unclustered objects.This process continues until all objects have been clustered or the remaining unclustered objects are less than 0.006 of the dataset’s size.The proposed method requires MinPts only as an input parameter because epsilon is computed from data.Benchmark datasets were used to evaluate the effectiveness of the proposed method that produced promising results.Practical experiments demonstrate that the outstanding ability of the proposed method to detect clusters of different densities even if there is no separation between them.The accuracy of the method ranges from 92%to 100%for the experimented datasets. 展开更多
关键词 Adaptive DBSCAN(ADBSCAN) Density-based clustering data clustering Varied density clusters
下载PDF
Scaling up Kernel Grower Clustering Method for Large Data Sets via Core-sets 被引量:2
14
作者 CHANG Liang DENG Xiao-Ming +1 位作者 ZHENG Sui-Wu WANG Yong-Qing 《自动化学报》 EI CSCD 北大核心 2008年第3期376-382,共7页
核栽培者是聚类最近 Camastra 和 Verri 建议的方法的一个新奇的核。它证明为各种各样的数据的好性能关于流行聚类的算法有利地设定并且比较。然而,方法的主要缺点是在处理大数据集合的弱可伸缩能力,它极大地限制它的应用程序。在这... 核栽培者是聚类最近 Camastra 和 Verri 建议的方法的一个新奇的核。它证明为各种各样的数据的好性能关于流行聚类的算法有利地设定并且比较。然而,方法的主要缺点是在处理大数据集合的弱可伸缩能力,它极大地限制它的应用程序。在这份报纸,我们用核心集合建议一个可伸缩起来的核栽培者方法,它是比为聚类的大数据的原来的方法显著地快的。同时,它能处理很大的数据集合。象合成数据集合一样的基准数据集合的数字实验显示出建议方法的效率。方法也被用于真实图象分割说明它的性能。 展开更多
关键词 大型数据集 图象分割 模式识别 磁心配置 核聚类
下载PDF
Local and global approaches of affinity propagation clustering for large scale data 被引量:15
15
作者 Ding-yin XIA Fei WU Xu-qing ZHAN Yue-ting ZHUANG 《Journal of Zhejiang University-Science A(Applied Physics & Engineering)》 SCIE EI CAS CSCD 2008年第10期1373-1381,共9页
Recently a new clustering algorithm called 'affinity propagation' (AP) has been proposed, which efficiently clustered sparsely related data by passing messages between data points. However, we want to cluster ... Recently a new clustering algorithm called 'affinity propagation' (AP) has been proposed, which efficiently clustered sparsely related data by passing messages between data points. However, we want to cluster large scale data where the similarities are not sparse in many cases. This paper presents two variants of AP for grouping large scale data with a dense similarity matrix. The local approach is partition affinity propagation (PAP) and the global method is landmark affinity propagation (LAP). PAP passes messages in the subsets of data first and then merges them as the number of initial step of iterations; it can effectively reduce the number of iterations of clustering. LAP passes messages between the landmark data points first and then clusters non-landmark data points; it is a large global approximation method to speed up clustering. Experiments are conducted on many datasets, such as random data points, manifold subspaces, images of faces and Chinese calligraphy, and the results demonstrate that the two ap-proaches are feasible and practicable. 展开更多
关键词 聚类 大规模数据 传播方式 计算机技术
下载PDF
Data Stream Subspace Clustering for Anomalous Network Packet Detection 被引量:1
16
作者 Zachary Miller Wei Hu 《Journal of Information Security》 2012年第3期215-223,共9页
As the Internet offers increased connectivity between human beings, it has fallen prey to malicious users who exploit its resources to gain illegal access to critical information. In an effort to protect computer netw... As the Internet offers increased connectivity between human beings, it has fallen prey to malicious users who exploit its resources to gain illegal access to critical information. In an effort to protect computer networks from external attacks, two common types of Intrusion Detection Systems (IDSs) are often deployed. The first type is signature-based IDSs which can detect intrusions efficiently by scanning network packets and comparing them with human-generated signatures describing previously-observed attacks. The second type is anomaly-based IDSs able to detect new attacks through modeling normal network traffic without the need for a human expert. Despite this advantage, anomaly-based IDSs are limited by a high false-alarm rate and difficulty detecting network attacks attempting to blend in with normal traffic. In this study, we propose a StreamPreDeCon anomaly-based IDS. StreamPreDeCon is an extension of the preference subspace clustering algorithm PreDeCon designed to resolve some of the challenges associated with anomalous packet detection. Using network packets extracted from the first week of the DARPA '99 intrusion detection evaluation dataset combined with Generic Http, Shellcode and CLET attacks, our IDS achieved 94.4% sensitivity and 0.726% false positives in a best case scenario. To measure the overall effectiveness of the IDS, the average sensitivity and false positive rates were calculated for both the maximum sensitivity and the minimum false positive rate. With the maximum sensitivity, the IDS had 80% sensitivity and 9% false positives on average. The IDS also averaged 63% sensitivity with a 0.4% false positive rate when the minimal number of false positives is needed. These rates are an improvement on results found in a previous study as the sensitivity rate in general increased while the false positive rate decreased. 展开更多
关键词 ANOMALY DETECTION INTRUSION DETECTION System Network Security PREFERENCE SUBSPACE clustering Stream data Mining
下载PDF
Clustering method based on data division and partition 被引量:1
17
作者 卢志茂 刘晨 +2 位作者 S.Massinanke 张春祥 王蕾 《Journal of Central South University》 SCIE EI CAS 2014年第1期213-222,共10页
Many classical clustering algorithms do good jobs on their prerequisite but do not scale well when being applied to deal with very large data sets(VLDS).In this work,a novel division and partition clustering method(DP... Many classical clustering algorithms do good jobs on their prerequisite but do not scale well when being applied to deal with very large data sets(VLDS).In this work,a novel division and partition clustering method(DP) was proposed to solve the problem.DP cut the source data set into data blocks,and extracted the eigenvector for each data block to form the local feature set.The local feature set was used in the second round of the characteristics polymerization process for the source data to find the global eigenvector.Ultimately according to the global eigenvector,the data set was assigned by criterion of minimum distance.The experimental results show that it is more robust than the conventional clusterings.Characteristics of not sensitive to data dimensions,distribution and number of nature clustering make it have a wide range of applications in clustering VLDS. 展开更多
关键词 聚类方法 分区 特征向量 聚类算法 就业机会 聚合过程 最小距离 数据集
下载PDF
Clustering Categorical Data:A Cluster Ensemble Approach
18
作者 何增友 Xu +2 位作者 Xiaofei Deng Shengchun 《High Technology Letters》 EI CAS 2003年第4期8-12,共5页
Clustering categorical data, an integral part of data mining,has attracted much attention recently. In this paper, the authors formally define the categorical data clustering problem as an optimization problem from th... Clustering categorical data, an integral part of data mining,has attracted much attention recently. In this paper, the authors formally define the categorical data clustering problem as an optimization problem from the viewpoint of cluster ensemble, and apply cluster ensemble approach for clustering categorical data. Experimental results on real datasets show that better clustering accuracy can be obtained by comparing with existing categorical data clustering algorithms. 展开更多
关键词 集群技术 分类数据 非线性动力系统 计算机技术
下载PDF
Incorporating heterogeneous biological data sources in clustering gene expression data
19
作者 Gang-Guo Li Zheng-Zhi Wang 《Health》 2009年第1期17-23,共7页
In this paper, a similarity measure between genes with protein-protein interactions is pro-posed. The chip-chip data are converted into the same form of gene expression data with pear-son correlation as its similarity... In this paper, a similarity measure between genes with protein-protein interactions is pro-posed. The chip-chip data are converted into the same form of gene expression data with pear-son correlation as its similarity measure. On the basis of the similarity measures of protein- protein interaction data and chip-chip data, the combined dissimilarity measure is defined. The combined distance measure is introduced into K-means method, which can be considered as an improved K-means method. The improved K-means method and other three clustering methods are evaluated by a real dataset. Per-formance of these methods is assessed by a prediction accuracy analysis through known gene annotations. Our results show that the improved K-means method outperforms other clustering methods. The performance of the improved K-means method is also tested by varying the tuning coefficients of the combined dissimilarity measure. The results show that it is very helpful and meaningful to incorporate het-erogeneous data sources in clustering gene expression data, and those coefficients for the genome-wide or completed data sources should be given larger values when constructing the combined dissimilarity measure. 展开更多
关键词 STATISTICAL Analysis Similarity/ DISSIMILARITY MEASURE Gene Expression data clustering data Fusion
下载PDF
D-IMPACT: A Data Preprocessing Algorithm to Improve the Performance of Clustering
20
作者 Vu Anh Tran Osamu Hirose +8 位作者 Thammakorn Saethang Lan Anh T. Nguyen Xuan Tho Dang Tu Kien T. Le Duc Luu Ngo Gavrilov Sergey Mamoru Kubo Yoichi Yamada Kenji Satou 《Journal of Software Engineering and Applications》 2014年第8期639-654,共16页
In this study, we propose a data preprocessing algorithm called D-IMPACT inspired by the IMPACT clustering algorithm. D-IMPACT iteratively moves data points based on attraction and density to detect and remove noise a... In this study, we propose a data preprocessing algorithm called D-IMPACT inspired by the IMPACT clustering algorithm. D-IMPACT iteratively moves data points based on attraction and density to detect and remove noise and outliers, and separate clusters. Our experimental results on two-dimensional datasets and practical datasets show that this algorithm can produce new datasets such that the performance of the clustering algorithm is improved. 展开更多
关键词 ATTRACTION clusterING data PREPROCESSING DENSITY SHRINKING
下载PDF
上一页 1 2 250 下一页 到第
使用帮助 返回顶部