离群点检测任务是指检测与正常数据在特征属性上存在显著差异的异常数据。大多数基于聚类的离群点检测方法主要从全局角度对数据集中的离群点进行检测,而对局部离群点的检测性能较弱。基于此,本文通过引入快速搜索和发现密度峰值方法改...离群点检测任务是指检测与正常数据在特征属性上存在显著差异的异常数据。大多数基于聚类的离群点检测方法主要从全局角度对数据集中的离群点进行检测,而对局部离群点的检测性能较弱。基于此,本文通过引入快速搜索和发现密度峰值方法改进K-means聚类算法,提出了一种名为KLOD(local outlier detection based on improved K-means and least-squares methods)的局部离群点检测方法,以实现对局部离群点的精确检测。首先,利用快速搜索和发现密度峰值方法计算数据点的局部密度和相对距离,并将二者相乘得到γ值。其次,将γ值降序排序,利用肘部法则选择γ值最大的k个数据点作为K-means聚类算法的初始聚类中心。然后,通过K-means聚类算法将数据集聚类成k个簇,计算数据点在每个维度上的目标函数值并进行升序排列。接着,确定数据点的每个维度的离散程度并选择适当的拟合函数和拟合点,通过最小二乘法对升序排列的每个簇的每1维目标函数值进行函数拟合并求导,以获取变化率。最后,结合信息熵,将每个数据点的每个维度目标函数值乘以相应的变化率进行加权,得到最终的异常得分,并将异常值得分较高的top-n个数据点视为离群点。通过人工数据集和UCI数据集,对KLOD、LOF和KNN方法在准确度上进行仿真实验对比。结果表明KLOD方法相较于KNN和LOF方法具有更高的准确度。本文提出的KLOD方法能够有效改善K-means聚类算法的聚类效果,并且在局部离群点检测方面具有较好的精度和性能。展开更多
Polycrystalline bulk Ti3AlC2 material with high purity and density was fabricated by hot pressing from the powder mixture with the starting stoichiometric mole ratios of 2.0TiC/ 1.0Ti/ 1.1A1/ 0.1Si at 1 300-1 500℃. X...Polycrystalline bulk Ti3AlC2 material with high purity and density was fabricated by hot pressing from the powder mixture with the starting stoichiometric mole ratios of 2.0TiC/ 1.0Ti/ 1.1A1/ 0.1Si at 1 300-1 500℃. X-ray diffraction patterns and scanning electron microscopy photographs of the fully dense samples indicate that the proper addition of silicon is favorable to the formation of Ti3AlC2, consequently results in high purity of the prepared samples. The Ti3AlC2 hot pressed at 1 300℃and 1 400℃is in plane-shape with sizes of 6-8μm and 15-20μm in the elongated dimension, respectively. The purities of samples are measured by the K-value method, and the contents of TiC are given by a linear equation.展开更多
Consider the regression model Y=Xβ+ g(T) + e. Here g is an unknown smoothing function on [0, 1], β is a l-dimensional parameter to be estimated, and e is an unobserved error. When data are randomly censored, the est...Consider the regression model Y=Xβ+ g(T) + e. Here g is an unknown smoothing function on [0, 1], β is a l-dimensional parameter to be estimated, and e is an unobserved error. When data are randomly censored, the estimators βn* and gn*forβ and g are obtained by using class K and the least square methods. It is shown that βn* is asymptotically normal and gn* achieves the convergent rate O(n-1/3).展开更多
Tarq geochemical 1:100,000 Sheet is located in Isfahan province which is investigated by Iran’s Geological and Explorations Organization using stream sediment analyzes. This area has stratigraphy of Precambrian to Qu...Tarq geochemical 1:100,000 Sheet is located in Isfahan province which is investigated by Iran’s Geological and Explorations Organization using stream sediment analyzes. This area has stratigraphy of Precambrian to Quaternary rocks and is located in the Central Iran zone. According to the presence of signs of gold mineralization in this area, it is necessary to identify important mineral areas in this area. Therefore, finding information is necessary about the relationship and monitoring the elements of gold, arsenic, and antimony relative to each other in this area to determine the extent of geochemical halos and to estimate the grade. Therefore, a well-known and useful K-means method is used for monitoring the elements in the present study, this is a clustering method based on minimizing the total Euclidean distances of each sample from the center of the classes which are assigned to them. In this research, the clustering quality function and the utility rate of the sample have been used in the desired cluster (S(i)) to determine the optimum number of clusters. Finally, with regard to the cluster centers and the results, the equations were used to predict the amount of the gold element based on four parameters of arsenic and antimony grade, length and width of sampling points.展开更多
Based directly on the original definition of K-S entropy, a new algorithm for calculating K-S entropy from chaotic time series is developed by using some techniques of coding and code operation.
文摘离群点检测任务是指检测与正常数据在特征属性上存在显著差异的异常数据。大多数基于聚类的离群点检测方法主要从全局角度对数据集中的离群点进行检测,而对局部离群点的检测性能较弱。基于此,本文通过引入快速搜索和发现密度峰值方法改进K-means聚类算法,提出了一种名为KLOD(local outlier detection based on improved K-means and least-squares methods)的局部离群点检测方法,以实现对局部离群点的精确检测。首先,利用快速搜索和发现密度峰值方法计算数据点的局部密度和相对距离,并将二者相乘得到γ值。其次,将γ值降序排序,利用肘部法则选择γ值最大的k个数据点作为K-means聚类算法的初始聚类中心。然后,通过K-means聚类算法将数据集聚类成k个簇,计算数据点在每个维度上的目标函数值并进行升序排列。接着,确定数据点的每个维度的离散程度并选择适当的拟合函数和拟合点,通过最小二乘法对升序排列的每个簇的每1维目标函数值进行函数拟合并求导,以获取变化率。最后,结合信息熵,将每个数据点的每个维度目标函数值乘以相应的变化率进行加权,得到最终的异常得分,并将异常值得分较高的top-n个数据点视为离群点。通过人工数据集和UCI数据集,对KLOD、LOF和KNN方法在准确度上进行仿真实验对比。结果表明KLOD方法相较于KNN和LOF方法具有更高的准确度。本文提出的KLOD方法能够有效改善K-means聚类算法的聚类效果,并且在局部离群点检测方面具有较好的精度和性能。
文摘Polycrystalline bulk Ti3AlC2 material with high purity and density was fabricated by hot pressing from the powder mixture with the starting stoichiometric mole ratios of 2.0TiC/ 1.0Ti/ 1.1A1/ 0.1Si at 1 300-1 500℃. X-ray diffraction patterns and scanning electron microscopy photographs of the fully dense samples indicate that the proper addition of silicon is favorable to the formation of Ti3AlC2, consequently results in high purity of the prepared samples. The Ti3AlC2 hot pressed at 1 300℃and 1 400℃is in plane-shape with sizes of 6-8μm and 15-20μm in the elongated dimension, respectively. The purities of samples are measured by the K-value method, and the contents of TiC are given by a linear equation.
文摘Consider the regression model Y=Xβ+ g(T) + e. Here g is an unknown smoothing function on [0, 1], β is a l-dimensional parameter to be estimated, and e is an unobserved error. When data are randomly censored, the estimators βn* and gn*forβ and g are obtained by using class K and the least square methods. It is shown that βn* is asymptotically normal and gn* achieves the convergent rate O(n-1/3).
文摘Tarq geochemical 1:100,000 Sheet is located in Isfahan province which is investigated by Iran’s Geological and Explorations Organization using stream sediment analyzes. This area has stratigraphy of Precambrian to Quaternary rocks and is located in the Central Iran zone. According to the presence of signs of gold mineralization in this area, it is necessary to identify important mineral areas in this area. Therefore, finding information is necessary about the relationship and monitoring the elements of gold, arsenic, and antimony relative to each other in this area to determine the extent of geochemical halos and to estimate the grade. Therefore, a well-known and useful K-means method is used for monitoring the elements in the present study, this is a clustering method based on minimizing the total Euclidean distances of each sample from the center of the classes which are assigned to them. In this research, the clustering quality function and the utility rate of the sample have been used in the desired cluster (S(i)) to determine the optimum number of clusters. Finally, with regard to the cluster centers and the results, the equations were used to predict the amount of the gold element based on four parameters of arsenic and antimony grade, length and width of sampling points.
基金The project supported by National Natural Science Foundation of China
文摘Based directly on the original definition of K-S entropy, a new algorithm for calculating K-S entropy from chaotic time series is developed by using some techniques of coding and code operation.