Most clustering algorithms need to describe the similarity of objects by a predefined distance function. Three distance functions which are widely used in two traditional clustering algorithms k-means and hierarchical...Most clustering algorithms need to describe the similarity of objects by a predefined distance function. Three distance functions which are widely used in two traditional clustering algorithms k-means and hierarchical clustering were investigated. Both theoretical analysis and detailed experimental results were given. It is shown that a distance function greatly affects clustering results and can be used to detect the outlier of a cluster by the comparison of such different results and give the shape information of clusters. In practice situation, it is suggested to use different distance function separately, compare the clustering results and pick out the 搒wing points? And such points may leak out more information for data analysts.展开更多
According to the characteristics of sonar image data with manifold feature,the sonar image detection method based on two-phase manifold partner clustering algorithm is proposed. Firstly,K-means block clustering based ...According to the characteristics of sonar image data with manifold feature,the sonar image detection method based on two-phase manifold partner clustering algorithm is proposed. Firstly,K-means block clustering based on euclidean distance is proposed to reduce the data set. Mean value,standard deviation,and gray minimum value are considered as three features based on the relatinship between clustering model and data structure. Then K-means clustering algorithm based on manifold distance is utilized clustering again on the reduced data set to improve the detection efficiency. In K-means clustering algorithm based on manifold distance,line segment length on the manifold is analyzed,and a new power function line segment length is proposed to decrease the computational complexity. In order to quickly calculate the manifold distance,new allsource shortest path as the pretreatment of efficient algorithm is proposed. Based on this,the spatial feature of the image block is added in the three features to get the final precise partner clustering algorithm. The comparison with the other typical clustering algorithms demonstrates that the proposed algorithm gets good detection result. And it has better adaptability by experiments of the different real sonar images.展开更多
K-means clustering algorithm is an important algorithm in unsupervised learning and plays an important role in big data processing, computer vision and other research fields. However, due to its sensitivity to initial...K-means clustering algorithm is an important algorithm in unsupervised learning and plays an important role in big data processing, computer vision and other research fields. However, due to its sensitivity to initial partition, outliers, noise and other factors, the clustering results in data analysis, image segmentation and other fields are unstable and weak in robustness. Based on the fast global K-means clustering algorithm, this paper proposed an improved K-means clustering algorithm. Through the neighborhood filtering mechanism, the points in the neighborhood of the selected initial clustering center have not participated in the selection of the next initial clustering center, which can effectively reduce the randomness of initial partition and improve the efficiency of initial partition. Mahalanobis distance was used in the clustering process to better consider the global nature of data. Compared with the traditional clustering algorithm and other optimization algorithms, the results of real data set testing are significantly improved.展开更多
作为识别攻击或异常行为以保护网络安全的重要步骤之一,网络入侵检测常常与数据挖掘或机器学习技术结合应用.如今,随着网络数据的爆炸性增长,传统的入侵检测技术面临着海量数据检测处理的问题,现有入侵检测系统往往难以同时满足实时性...作为识别攻击或异常行为以保护网络安全的重要步骤之一,网络入侵检测常常与数据挖掘或机器学习技术结合应用.如今,随着网络数据的爆炸性增长,传统的入侵检测技术面临着海量数据检测处理的问题,现有入侵检测系统往往难以同时满足实时性和有效性的需求.本文尝试将可拓学中的可拓距概念引入网络入侵检测研究中,提出了一种基于可拓距的特征变换方法,将数据点的原特征映射为簇外中心距和簇内可拓距这两大部分,根据原始数据多维特征生成新的特征,以达到特征降维的目的,旨在同时满足网络入侵检测系统的实时性和有效性的需求.本文使用KDD CUP 99作为仿真数据集测试所提出的基于可拓距的方法在网络入侵检测特征变换中的应用效果.实验结果表明,较之传统的KNN算法,基于可拓距的方法明显地减少了检测时间,而同时其检测率的下降可以控制在1%之内,具有较好的时效性优势.展开更多
针对区间值数据的数据聚类问题,根据可拓学关联函数的定义,提出可拓距离的概念来度量数据之间的距离,利用K近邻的思想,根据可拓距离的大小对数据集的目标属性进行投票选择进行分类,设计了可拓K近邻算法(Extension K Nearest Neighbor,EK...针对区间值数据的数据聚类问题,根据可拓学关联函数的定义,提出可拓距离的概念来度量数据之间的距离,利用K近邻的思想,根据可拓距离的大小对数据集的目标属性进行投票选择进行分类,设计了可拓K近邻算法(Extension K Nearest Neighbor,EKNN)。最后利用UCI的两个基准数据集Iris植物样本数据和糖尿病数据库PIDD进行验证,首先通过免疫网络约简算法对条件属性进行最小属性约简,然后利用EKNN算法分析和比较不同最小约简属性下的分类准确率。展开更多
Recently, negative databases (NDBs) are proposed for privacy protection. Similar to the traditional databases, some basic operations could be conducted over the NDBs, such as select, intersection, update, delete and...Recently, negative databases (NDBs) are proposed for privacy protection. Similar to the traditional databases, some basic operations could be conducted over the NDBs, such as select, intersection, update, delete and so on. However, both classifying and clustering in negative databases have not yet been studied. Therefore, two algorithms, i.e., a k nearest neighbor (kNN) classification algorithm and a k-means clustering algorithm in NDBs, are proposed in this paper, respectively. The core of these two algorithms is a novel method for estimating the Hamming distance between a binary string and an NDB. Experimental results demonstrate that classifying and clustering in NDBs are promising.展开更多
Background:The deterrence effect of automated speed camera(ASC)is still inconclusive.Moreover,it is pointed out that ASC may have varying deterrence effects on different types of road users(e.g.,taxis).Objective:This ...Background:The deterrence effect of automated speed camera(ASC)is still inconclusive.Moreover,it is pointed out that ASC may have varying deterrence effects on different types of road users(e.g.,taxis).Objective:This study intends to investigate the distance halo effect of fixed ASC(hereafter called ASC)on taxis.Method:More than 1.34 million taxis’GPS trajectory data were collected.A novel indicator,the delta speed(defined as the difference between the traveling speed and the speed limit),was proposed to continuously describe the variations in traveling speeds.The upstream and downstream critical delta speeds during each time period on weekdays and weekends were obtained by using K-means clustering method,respectively.Based on the critical delta speeds,the ranges of upstream and downstream distance halo effects of ASC during different time periods on weekdays and weekends were determined separately and compared.Results:The downstream critical delta speed is smaller than the upstream one.The upstream and downstream distance halo effects of ASC on taxis are within a range of 8-2180 m and an area of 10-580 m to the ASC location,respectively.There are no obvious difference in the ranges of upstream and downstream distance halo effects of ASC on taxis between different time periods or between weekdays and weekends.Conclusion:The present study confirms that the upstream and downstream distance halo effects of ASC on taxis have different ranges and the stabilities of time-of-day and day-of-week.Practical application:The findings of this study can provide a basic reference for reasonably deploying ASCs within a region.展开更多
文摘Most clustering algorithms need to describe the similarity of objects by a predefined distance function. Three distance functions which are widely used in two traditional clustering algorithms k-means and hierarchical clustering were investigated. Both theoretical analysis and detailed experimental results were given. It is shown that a distance function greatly affects clustering results and can be used to detect the outlier of a cluster by the comparison of such different results and give the shape information of clusters. In practice situation, it is suggested to use different distance function separately, compare the clustering results and pick out the 搒wing points? And such points may leak out more information for data analysts.
基金Sponsored by the National Natural Science Foundation of China(Grant No.41306086)the Technology Innovation Talent Special Foundation of Harbin(Grant No.2014RFQXJ105)the Fundamental Research Funds for the Central Universities(Grant No.HEUCFR1121,HEUCF100606)
文摘According to the characteristics of sonar image data with manifold feature,the sonar image detection method based on two-phase manifold partner clustering algorithm is proposed. Firstly,K-means block clustering based on euclidean distance is proposed to reduce the data set. Mean value,standard deviation,and gray minimum value are considered as three features based on the relatinship between clustering model and data structure. Then K-means clustering algorithm based on manifold distance is utilized clustering again on the reduced data set to improve the detection efficiency. In K-means clustering algorithm based on manifold distance,line segment length on the manifold is analyzed,and a new power function line segment length is proposed to decrease the computational complexity. In order to quickly calculate the manifold distance,new allsource shortest path as the pretreatment of efficient algorithm is proposed. Based on this,the spatial feature of the image block is added in the three features to get the final precise partner clustering algorithm. The comparison with the other typical clustering algorithms demonstrates that the proposed algorithm gets good detection result. And it has better adaptability by experiments of the different real sonar images.
文摘K-means clustering algorithm is an important algorithm in unsupervised learning and plays an important role in big data processing, computer vision and other research fields. However, due to its sensitivity to initial partition, outliers, noise and other factors, the clustering results in data analysis, image segmentation and other fields are unstable and weak in robustness. Based on the fast global K-means clustering algorithm, this paper proposed an improved K-means clustering algorithm. Through the neighborhood filtering mechanism, the points in the neighborhood of the selected initial clustering center have not participated in the selection of the next initial clustering center, which can effectively reduce the randomness of initial partition and improve the efficiency of initial partition. Mahalanobis distance was used in the clustering process to better consider the global nature of data. Compared with the traditional clustering algorithm and other optimization algorithms, the results of real data set testing are significantly improved.
文摘作为识别攻击或异常行为以保护网络安全的重要步骤之一,网络入侵检测常常与数据挖掘或机器学习技术结合应用.如今,随着网络数据的爆炸性增长,传统的入侵检测技术面临着海量数据检测处理的问题,现有入侵检测系统往往难以同时满足实时性和有效性的需求.本文尝试将可拓学中的可拓距概念引入网络入侵检测研究中,提出了一种基于可拓距的特征变换方法,将数据点的原特征映射为簇外中心距和簇内可拓距这两大部分,根据原始数据多维特征生成新的特征,以达到特征降维的目的,旨在同时满足网络入侵检测系统的实时性和有效性的需求.本文使用KDD CUP 99作为仿真数据集测试所提出的基于可拓距的方法在网络入侵检测特征变换中的应用效果.实验结果表明,较之传统的KNN算法,基于可拓距的方法明显地减少了检测时间,而同时其检测率的下降可以控制在1%之内,具有较好的时效性优势.
文摘针对区间值数据的数据聚类问题,根据可拓学关联函数的定义,提出可拓距离的概念来度量数据之间的距离,利用K近邻的思想,根据可拓距离的大小对数据集的目标属性进行投票选择进行分类,设计了可拓K近邻算法(Extension K Nearest Neighbor,EKNN)。最后利用UCI的两个基准数据集Iris植物样本数据和糖尿病数据库PIDD进行验证,首先通过免疫网络约简算法对条件属性进行最小属性约简,然后利用EKNN算法分析和比较不同最小约简属性下的分类准确率。
基金This work was partly supported by the National Natural Science Foundation of China (Grant'No. 61175045).
文摘Recently, negative databases (NDBs) are proposed for privacy protection. Similar to the traditional databases, some basic operations could be conducted over the NDBs, such as select, intersection, update, delete and so on. However, both classifying and clustering in negative databases have not yet been studied. Therefore, two algorithms, i.e., a k nearest neighbor (kNN) classification algorithm and a k-means clustering algorithm in NDBs, are proposed in this paper, respectively. The core of these two algorithms is a novel method for estimating the Hamming distance between a binary string and an NDB. Experimental results demonstrate that classifying and clustering in NDBs are promising.
基金supported by the National Natural Science Foundation of China(71801182,61703352)the China Scholarship Council(201907005017)Sichuan Provincial Science and Technology Program(2020YFH0035).
文摘Background:The deterrence effect of automated speed camera(ASC)is still inconclusive.Moreover,it is pointed out that ASC may have varying deterrence effects on different types of road users(e.g.,taxis).Objective:This study intends to investigate the distance halo effect of fixed ASC(hereafter called ASC)on taxis.Method:More than 1.34 million taxis’GPS trajectory data were collected.A novel indicator,the delta speed(defined as the difference between the traveling speed and the speed limit),was proposed to continuously describe the variations in traveling speeds.The upstream and downstream critical delta speeds during each time period on weekdays and weekends were obtained by using K-means clustering method,respectively.Based on the critical delta speeds,the ranges of upstream and downstream distance halo effects of ASC during different time periods on weekdays and weekends were determined separately and compared.Results:The downstream critical delta speed is smaller than the upstream one.The upstream and downstream distance halo effects of ASC on taxis are within a range of 8-2180 m and an area of 10-580 m to the ASC location,respectively.There are no obvious difference in the ranges of upstream and downstream distance halo effects of ASC on taxis between different time periods or between weekdays and weekends.Conclusion:The present study confirms that the upstream and downstream distance halo effects of ASC on taxis have different ranges and the stabilities of time-of-day and day-of-week.Practical application:The findings of this study can provide a basic reference for reasonably deploying ASCs within a region.