离群点检测任务是指检测与正常数据在特征属性上存在显著差异的异常数据。大多数基于聚类的离群点检测方法主要从全局角度对数据集中的离群点进行检测,而对局部离群点的检测性能较弱。基于此,本文通过引入快速搜索和发现密度峰值方法改...离群点检测任务是指检测与正常数据在特征属性上存在显著差异的异常数据。大多数基于聚类的离群点检测方法主要从全局角度对数据集中的离群点进行检测,而对局部离群点的检测性能较弱。基于此,本文通过引入快速搜索和发现密度峰值方法改进K-means聚类算法,提出了一种名为KLOD(local outlier detection based on improved K-means and least-squares methods)的局部离群点检测方法,以实现对局部离群点的精确检测。首先,利用快速搜索和发现密度峰值方法计算数据点的局部密度和相对距离,并将二者相乘得到γ值。其次,将γ值降序排序,利用肘部法则选择γ值最大的k个数据点作为K-means聚类算法的初始聚类中心。然后,通过K-means聚类算法将数据集聚类成k个簇,计算数据点在每个维度上的目标函数值并进行升序排列。接着,确定数据点的每个维度的离散程度并选择适当的拟合函数和拟合点,通过最小二乘法对升序排列的每个簇的每1维目标函数值进行函数拟合并求导,以获取变化率。最后,结合信息熵,将每个数据点的每个维度目标函数值乘以相应的变化率进行加权,得到最终的异常得分,并将异常值得分较高的top-n个数据点视为离群点。通过人工数据集和UCI数据集,对KLOD、LOF和KNN方法在准确度上进行仿真实验对比。结果表明KLOD方法相较于KNN和LOF方法具有更高的准确度。本文提出的KLOD方法能够有效改善K-means聚类算法的聚类效果,并且在局部离群点检测方面具有较好的精度和性能。展开更多
Linear minimum mean square error(MMSE)detection has been shown to achieve near-optimal performance for massive multiple-input multiple-output(MIMO)systems but inevitably involves complicated matrix inversion,which ent...Linear minimum mean square error(MMSE)detection has been shown to achieve near-optimal performance for massive multiple-input multiple-output(MIMO)systems but inevitably involves complicated matrix inversion,which entails high complexity.To avoid the exact matrix inversion,a considerable number of implicit and explicit approximate matrix inversion based detection methods is proposed.By combining the advantages of both the explicit and the implicit matrix inversion,this paper introduces a new low-complexity signal detection algorithm.Firstly,the relationship between implicit and explicit techniques is analyzed.Then,an enhanced Newton iteration method is introduced to realize an approximate MMSE detection for massive MIMO uplink systems.The proposed improved Newton iteration significantly reduces the complexity of conventional Newton iteration.However,its complexity is still high for higher iterations.Thus,it is applied only for first two iterations.For subsequent iterations,we propose a novel trace iterative method(TIM)based low-complexity algorithm,which has significantly lower complexity than higher Newton iterations.Convergence guarantees of the proposed detector are also provided.Numerical simulations verify that the proposed detector exhibits significant performance enhancement over recently reported iterative detectors and achieves close-to-MMSE performance while retaining the low-complexity advantage for systems with hundreds of antennas.展开更多
Compositional data, such as relative information, is a crucial aspect of machine learning and other related fields. It is typically recorded as closed data or sums to a constant, like 100%. The statistical linear mode...Compositional data, such as relative information, is a crucial aspect of machine learning and other related fields. It is typically recorded as closed data or sums to a constant, like 100%. The statistical linear model is the most used technique for identifying hidden relationships between underlying random variables of interest. However, data quality is a significant challenge in machine learning, especially when missing data is present. The linear regression model is a commonly used statistical modeling technique used in various applications to find relationships between variables of interest. When estimating linear regression parameters which are useful for things like future prediction and partial effects analysis of independent variables, maximum likelihood estimation (MLE) is the method of choice. However, many datasets contain missing observations, which can lead to costly and time-consuming data recovery. To address this issue, the expectation-maximization (EM) algorithm has been suggested as a solution for situations including missing data. The EM algorithm repeatedly finds the best estimates of parameters in statistical models that depend on variables or data that have not been observed. This is called maximum likelihood or maximum a posteriori (MAP). Using the present estimate as input, the expectation (E) step constructs a log-likelihood function. Finding the parameters that maximize the anticipated log-likelihood, as determined in the E step, is the job of the maximization (M) phase. This study looked at how well the EM algorithm worked on a made-up compositional dataset with missing observations. It used both the robust least square version and ordinary least square regression techniques. The efficacy of the EM algorithm was compared with two alternative imputation techniques, k-Nearest Neighbor (k-NN) and mean imputation (), in terms of Aitchison distances and covariance.展开更多
The mean shift tracker has difficulty in tracking fast moving targets and suffers from tracking error accumulation problem. To overcome the limitations of the mean shift method, a new approach is proposed by integrati...The mean shift tracker has difficulty in tracking fast moving targets and suffers from tracking error accumulation problem. To overcome the limitations of the mean shift method, a new approach is proposed by integrating the mean shift algorithm and frame-difference methods. The rough position of the moving tar- get is first located by the direct frame-difference algorithm and three-frame-difference algorithm for the immobile camera scenes and mobile camera scenes, respectively. Then, the mean shift algorithm is used to achieve precise tracking of the target. Several tracking experiments show that the proposed method can effectively track first moving targets and overcome the tracking error accumulation problem.展开更多
Significant wave height is an important criterion in designing coastal and offshore structures.Based on the orthogonality principle, the linear mean square estimation method is applied to calculate significant wave he...Significant wave height is an important criterion in designing coastal and offshore structures.Based on the orthogonality principle, the linear mean square estimation method is applied to calculate significant wave height in this paper.Twenty-eight-year time series of wave data collected from three ocean buoys near San Francisco along the California coast are analyzed.It is proved theoretically that the computation error will be reduced by using as many measured data as possible for the calculation of significant wave height.Measured significant wave height at one buoy location is compared with the calculated value based on the data from two other adjacent buoys.The results indicate that the linear mean square estimation method can be well applied to the calculation and prediction of significant wave height in coastal regions.展开更多
To reduce computational costs, an improved form of the frequency domain boundary element method(BEM) is proposed for two-dimensional radiation and propagation acoustic problems in a subsonic uniform flow with arbitr...To reduce computational costs, an improved form of the frequency domain boundary element method(BEM) is proposed for two-dimensional radiation and propagation acoustic problems in a subsonic uniform flow with arbitrary orientation. The boundary integral equation(BIE) representation solves the two-dimensional convected Helmholtz equation(CHE) and its fundamental solution, which must satisfy a new Sommerfeld radiation condition(SRC) in the physical space. In order to facilitate conventional formulations, the variables of the advanced form are expressed only in terms of the acoustic pressure as well as its normal and tangential derivatives, and their multiplication operators are based on the convected Green's kernel and its modified derivative. The proposed approach significantly reduces the CPU times of classical computational codes for modeling acoustic domains with arbitrary mean flow. It is validated by a comparison with the analytical solutions for the sound radiation problems of monopole,dipole and quadrupole sources in the presence of a subsonic uniform flow with arbitrary orientation.展开更多
Tarq geochemical 1:100,000 Sheet is located in Isfahan province which is investigated by Iran’s Geological and Explorations Organization using stream sediment analyzes. This area has stratigraphy of Precambrian to Qu...Tarq geochemical 1:100,000 Sheet is located in Isfahan province which is investigated by Iran’s Geological and Explorations Organization using stream sediment analyzes. This area has stratigraphy of Precambrian to Quaternary rocks and is located in the Central Iran zone. According to the presence of signs of gold mineralization in this area, it is necessary to identify important mineral areas in this area. Therefore, finding information is necessary about the relationship and monitoring the elements of gold, arsenic, and antimony relative to each other in this area to determine the extent of geochemical halos and to estimate the grade. Therefore, a well-known and useful K-means method is used for monitoring the elements in the present study, this is a clustering method based on minimizing the total Euclidean distances of each sample from the center of the classes which are assigned to them. In this research, the clustering quality function and the utility rate of the sample have been used in the desired cluster (S(i)) to determine the optimum number of clusters. Finally, with regard to the cluster centers and the results, the equations were used to predict the amount of the gold element based on four parameters of arsenic and antimony grade, length and width of sampling points.展开更多
Emprical mode decomposition(EMD) is a method and principle of decomposing signal dealing with Hilbert-Huang transform (HHT) in signal analysis, while directly-mean EMD is an improved EMD method presented by N.E.Huang,...Emprical mode decomposition(EMD) is a method and principle of decomposing signal dealing with Hilbert-Huang transform (HHT) in signal analysis, while directly-mean EMD is an improved EMD method presented by N.E.Huang, the inventor of HHT, which is aimed at solving the problems of EMD principle. Although the directly-mean HMD method is very remarkable with its advantages and N. E. Huang has given a method to realize it, he did not find the theoretic evidence of the method so that the feasibility of the idea and correctness of realizing the directly-mean EMD method is still indeterminate. For this a deep research on the forming process of complex signal is made and the involved stationary point principle and asymptotic stationary point principle are demonstrated, thus some theoretic evidences and the correct realizing way of directly-mean EMD method is firstly presented. Some simulation examples for demonstrating the idea presented are given.展开更多
The mean activity coefficient of 5, 10,15 , 20-tetrakis (P-methoxyl-O-sulfophenyl)porphyrin sodium in dilute aqueous solution has been determined in the modality range 0. 00547-0. 08871 mol · kg-1at 273. 2 K by t...The mean activity coefficient of 5, 10,15 , 20-tetrakis (P-methoxyl-O-sulfophenyl)porphyrin sodium in dilute aqueous solution has been determined in the modality range 0. 00547-0. 08871 mol · kg-1at 273. 2 K by the freezing-point depression method . The results of γ± are 0. 9945-0. 7695, it is in close agreement with that by isopiestic method.展开更多
In this paper, the random Euler and random Runge-Kutta of the second order methods are used in solving random differential initial value problems of first order. The conditions of the mean square convergence of the nu...In this paper, the random Euler and random Runge-Kutta of the second order methods are used in solving random differential initial value problems of first order. The conditions of the mean square convergence of the numerical solutions are studied. The statistical properties of the numerical solutions are computed through numerical case studies.展开更多
文摘离群点检测任务是指检测与正常数据在特征属性上存在显著差异的异常数据。大多数基于聚类的离群点检测方法主要从全局角度对数据集中的离群点进行检测,而对局部离群点的检测性能较弱。基于此,本文通过引入快速搜索和发现密度峰值方法改进K-means聚类算法,提出了一种名为KLOD(local outlier detection based on improved K-means and least-squares methods)的局部离群点检测方法,以实现对局部离群点的精确检测。首先,利用快速搜索和发现密度峰值方法计算数据点的局部密度和相对距离,并将二者相乘得到γ值。其次,将γ值降序排序,利用肘部法则选择γ值最大的k个数据点作为K-means聚类算法的初始聚类中心。然后,通过K-means聚类算法将数据集聚类成k个簇,计算数据点在每个维度上的目标函数值并进行升序排列。接着,确定数据点的每个维度的离散程度并选择适当的拟合函数和拟合点,通过最小二乘法对升序排列的每个簇的每1维目标函数值进行函数拟合并求导,以获取变化率。最后,结合信息熵,将每个数据点的每个维度目标函数值乘以相应的变化率进行加权,得到最终的异常得分,并将异常值得分较高的top-n个数据点视为离群点。通过人工数据集和UCI数据集,对KLOD、LOF和KNN方法在准确度上进行仿真实验对比。结果表明KLOD方法相较于KNN和LOF方法具有更高的准确度。本文提出的KLOD方法能够有效改善K-means聚类算法的聚类效果,并且在局部离群点检测方面具有较好的精度和性能。
基金supported by National Natural Science Foundation of China(62371225,62371227)。
文摘Linear minimum mean square error(MMSE)detection has been shown to achieve near-optimal performance for massive multiple-input multiple-output(MIMO)systems but inevitably involves complicated matrix inversion,which entails high complexity.To avoid the exact matrix inversion,a considerable number of implicit and explicit approximate matrix inversion based detection methods is proposed.By combining the advantages of both the explicit and the implicit matrix inversion,this paper introduces a new low-complexity signal detection algorithm.Firstly,the relationship between implicit and explicit techniques is analyzed.Then,an enhanced Newton iteration method is introduced to realize an approximate MMSE detection for massive MIMO uplink systems.The proposed improved Newton iteration significantly reduces the complexity of conventional Newton iteration.However,its complexity is still high for higher iterations.Thus,it is applied only for first two iterations.For subsequent iterations,we propose a novel trace iterative method(TIM)based low-complexity algorithm,which has significantly lower complexity than higher Newton iterations.Convergence guarantees of the proposed detector are also provided.Numerical simulations verify that the proposed detector exhibits significant performance enhancement over recently reported iterative detectors and achieves close-to-MMSE performance while retaining the low-complexity advantage for systems with hundreds of antennas.
文摘Compositional data, such as relative information, is a crucial aspect of machine learning and other related fields. It is typically recorded as closed data or sums to a constant, like 100%. The statistical linear model is the most used technique for identifying hidden relationships between underlying random variables of interest. However, data quality is a significant challenge in machine learning, especially when missing data is present. The linear regression model is a commonly used statistical modeling technique used in various applications to find relationships between variables of interest. When estimating linear regression parameters which are useful for things like future prediction and partial effects analysis of independent variables, maximum likelihood estimation (MLE) is the method of choice. However, many datasets contain missing observations, which can lead to costly and time-consuming data recovery. To address this issue, the expectation-maximization (EM) algorithm has been suggested as a solution for situations including missing data. The EM algorithm repeatedly finds the best estimates of parameters in statistical models that depend on variables or data that have not been observed. This is called maximum likelihood or maximum a posteriori (MAP). Using the present estimate as input, the expectation (E) step constructs a log-likelihood function. Finding the parameters that maximize the anticipated log-likelihood, as determined in the E step, is the job of the maximization (M) phase. This study looked at how well the EM algorithm worked on a made-up compositional dataset with missing observations. It used both the robust least square version and ordinary least square regression techniques. The efficacy of the EM algorithm was compared with two alternative imputation techniques, k-Nearest Neighbor (k-NN) and mean imputation (), in terms of Aitchison distances and covariance.
基金supported by the Fundamental Research Funds for the Central Universities Project(CDJZR10170010)
文摘The mean shift tracker has difficulty in tracking fast moving targets and suffers from tracking error accumulation problem. To overcome the limitations of the mean shift method, a new approach is proposed by integrating the mean shift algorithm and frame-difference methods. The rough position of the moving tar- get is first located by the direct frame-difference algorithm and three-frame-difference algorithm for the immobile camera scenes and mobile camera scenes, respectively. Then, the mean shift algorithm is used to achieve precise tracking of the target. Several tracking experiments show that the proposed method can effectively track first moving targets and overcome the tracking error accumulation problem.
基金support for this study was provided by the National Natural Science Foundation of China (No.40776006)Research Fund for the Doctoral Program of Higher Education of China (Grant No.20060423009)the Science and Technology Development Program of Shandong Province (Grant No.2008GGB01099)
文摘Significant wave height is an important criterion in designing coastal and offshore structures.Based on the orthogonality principle, the linear mean square estimation method is applied to calculate significant wave height in this paper.Twenty-eight-year time series of wave data collected from three ocean buoys near San Francisco along the California coast are analyzed.It is proved theoretically that the computation error will be reduced by using as many measured data as possible for the calculation of significant wave height.Measured significant wave height at one buoy location is compared with the calculated value based on the data from two other adjacent buoys.The results indicate that the linear mean square estimation method can be well applied to the calculation and prediction of significant wave height in coastal regions.
基金supported by National Engineering School of Tunis (No.13039.1)
文摘To reduce computational costs, an improved form of the frequency domain boundary element method(BEM) is proposed for two-dimensional radiation and propagation acoustic problems in a subsonic uniform flow with arbitrary orientation. The boundary integral equation(BIE) representation solves the two-dimensional convected Helmholtz equation(CHE) and its fundamental solution, which must satisfy a new Sommerfeld radiation condition(SRC) in the physical space. In order to facilitate conventional formulations, the variables of the advanced form are expressed only in terms of the acoustic pressure as well as its normal and tangential derivatives, and their multiplication operators are based on the convected Green's kernel and its modified derivative. The proposed approach significantly reduces the CPU times of classical computational codes for modeling acoustic domains with arbitrary mean flow. It is validated by a comparison with the analytical solutions for the sound radiation problems of monopole,dipole and quadrupole sources in the presence of a subsonic uniform flow with arbitrary orientation.
文摘Tarq geochemical 1:100,000 Sheet is located in Isfahan province which is investigated by Iran’s Geological and Explorations Organization using stream sediment analyzes. This area has stratigraphy of Precambrian to Quaternary rocks and is located in the Central Iran zone. According to the presence of signs of gold mineralization in this area, it is necessary to identify important mineral areas in this area. Therefore, finding information is necessary about the relationship and monitoring the elements of gold, arsenic, and antimony relative to each other in this area to determine the extent of geochemical halos and to estimate the grade. Therefore, a well-known and useful K-means method is used for monitoring the elements in the present study, this is a clustering method based on minimizing the total Euclidean distances of each sample from the center of the classes which are assigned to them. In this research, the clustering quality function and the utility rate of the sample have been used in the desired cluster (S(i)) to determine the optimum number of clusters. Finally, with regard to the cluster centers and the results, the equations were used to predict the amount of the gold element based on four parameters of arsenic and antimony grade, length and width of sampling points.
基金This project is supported by National Natural Science Foundation of China(No.50275154).
文摘Emprical mode decomposition(EMD) is a method and principle of decomposing signal dealing with Hilbert-Huang transform (HHT) in signal analysis, while directly-mean EMD is an improved EMD method presented by N.E.Huang, the inventor of HHT, which is aimed at solving the problems of EMD principle. Although the directly-mean HMD method is very remarkable with its advantages and N. E. Huang has given a method to realize it, he did not find the theoretic evidence of the method so that the feasibility of the idea and correctness of realizing the directly-mean EMD method is still indeterminate. For this a deep research on the forming process of complex signal is made and the involved stationary point principle and asymptotic stationary point principle are demonstrated, thus some theoretic evidences and the correct realizing way of directly-mean EMD method is firstly presented. Some simulation examples for demonstrating the idea presented are given.
文摘The mean activity coefficient of 5, 10,15 , 20-tetrakis (P-methoxyl-O-sulfophenyl)porphyrin sodium in dilute aqueous solution has been determined in the modality range 0. 00547-0. 08871 mol · kg-1at 273. 2 K by the freezing-point depression method . The results of γ± are 0. 9945-0. 7695, it is in close agreement with that by isopiestic method.
文摘In this paper, the random Euler and random Runge-Kutta of the second order methods are used in solving random differential initial value problems of first order. The conditions of the mean square convergence of the numerical solutions are studied. The statistical properties of the numerical solutions are computed through numerical case studies.
基金Supported by National Natural Science Foundation of China(10571036)the Key Discipline Development Program of Beijing Municipal Commission (XK100080537)