Classification of imbalanced data is a well explored issue in the data mining and machine learning community where one class representation is overwhelmed by other classes.The Imbalanced distribution of data is a natu...Classification of imbalanced data is a well explored issue in the data mining and machine learning community where one class representation is overwhelmed by other classes.The Imbalanced distribution of data is a natural occurrence in real world datasets,so needed to be dealt with carefully to get important insights.In case of imbalance in data sets,traditional classifiers have to sacrifice their performances,therefore lead to misclassifications.This paper suggests a weighted nearest neighbor approach in a fuzzy manner to deal with this issue.We have adapted the‘existing algorithm modification solution’to learn from imbalanced datasets that classify data without manipulating the natural distribution of data unlike the other popular data balancing methods.The K nearest neighbor is a non-parametric classification method that is mostly used in machine learning problems.Fuzzy classification with the nearest neighbor clears the belonging of an instance to classes and optimal weights with improved nearest neighbor concept helping to correctly classify imbalanced data.The proposed hybrid approach takes care of imbalance nature of data and reduces the inaccuracies appear in applications of original and traditional classifiers.Results show that it performs well over the existing fuzzy nearest neighbor and weighted neighbor strategies for imbalanced learning.展开更多
针对基于接收信号强度的位置指纹室内定位算法定位精度不高的问题,提出了一种均值层次聚类和自适应加权K近邻(weighted K nearest neighbor,WKNN)的室内定位算法。算法首先在设置的参考点上采集蓝牙信号强度构建离线指纹数据库,然后采...针对基于接收信号强度的位置指纹室内定位算法定位精度不高的问题,提出了一种均值层次聚类和自适应加权K近邻(weighted K nearest neighbor,WKNN)的室内定位算法。算法首先在设置的参考点上采集蓝牙信号强度构建离线指纹数据库,然后采用均值层次聚类方法将所有参考点根据各自之间的相似度分为n个类,滤除掉相似度较小的参考点,最后根据待定位点和参考点间的信号距离的相似度,计算出距离差的标准差来自适应确定K值,并进行位置估算。实验结果表明,本文提出的算法在定位精度上比WKNN、动态加权K近邻(enhanced weighted K nearest neighbor,EWKNN)方法分别提升了30.0%和18.0%,在定位实时性上比WKNN和EWKNN方法分别提高了19.2%和28.4%。将该算法用于室内物体定位,可以同时提高定位精度和定位实时性。展开更多
Whale optimization algorithm(WOA)is a new population-based meta-heuristic algorithm.WOA uses shrinking encircling mechanism,spiral rise,and random learning strategies to update whale’s positions.WOA has merit in term...Whale optimization algorithm(WOA)is a new population-based meta-heuristic algorithm.WOA uses shrinking encircling mechanism,spiral rise,and random learning strategies to update whale’s positions.WOA has merit in terms of simple calculation and high computational accuracy,but its convergence speed is slow and it is easy to fall into the local optimal solution.In order to overcome the shortcomings,this paper integrates adaptive neighborhood and hybrid mutation strategies into whale optimization algorithms,designs the average distance from itself to other whales as an adaptive neighborhood radius,and chooses to learn from the optimal solution in the neighborhood instead of random learning strategies.The hybrid mutation strategy is used to enhance the ability of algorithm to jump out of the local optimal solution.A new whale optimization algorithm(HMNWOA)is proposed.The proposed algorithm inherits the global search capability of the original algorithm,enhances the exploitation ability,improves the quality of the population,and thus improves the convergence speed of the algorithm.A feature selection algorithm based on binary HMNWOA is proposed.Twelve standard datasets from UCI repository test the validity of the proposed algorithm for feature selection.The experimental results show that HMNWOA is very competitive compared to the other six popular feature selection methods in improving the classification accuracy and reducing the number of features,and ensures that HMNWOA has strong search ability in the search feature space.展开更多
Consider the regression model Y=Xβ+ g(T) + e. Here g is an unknown smoothing function on [0, 1], β is a l-dimensional parameter to be estimated, and e is an unobserved error. When data are randomly censored, the est...Consider the regression model Y=Xβ+ g(T) + e. Here g is an unknown smoothing function on [0, 1], β is a l-dimensional parameter to be estimated, and e is an unobserved error. When data are randomly censored, the estimators βn* and gn*forβ and g are obtained by using class K and the least square methods. It is shown that βn* is asymptotically normal and gn* achieves the convergent rate O(n-1/3).展开更多
Various methods have been used to estimate the amount of above ground forest biomass across landscapes and to create biomass maps for specific stands or pixels across ownership or project areas. Without an accurate es...Various methods have been used to estimate the amount of above ground forest biomass across landscapes and to create biomass maps for specific stands or pixels across ownership or project areas. Without an accurate estimation method, land managers might end up with incorrect biomass estimate maps, which could lead them to make poorer decisions in their future management plans. The goal of this study was to compare various imputation methods to predict forest biomass and basal area, at a project planning scale (a combination of ground inventory plots, light detection and ranging (LiDAR) data, satellite imagery, and climate data was analyzed, and their root mean square error (RMSE) and bias were calculated. Results indicate that for biomass prediction, the k-nn (k = 5) had the lowest RMSE and least amount of bias. The second most accurate method consisted of the k-nn (k = 3), followed by the GWR model, and the random forest imputation. For basal area prediction, the GWR model had the lowest RMSE and least amount of bias. The second most accurate method was k-nn (k = 5), followed by k-nn (k = 3), and the random forest method. For both metrics, the GNN method was the least accurate based on the ranking of RMSE and bias.展开更多
文摘Classification of imbalanced data is a well explored issue in the data mining and machine learning community where one class representation is overwhelmed by other classes.The Imbalanced distribution of data is a natural occurrence in real world datasets,so needed to be dealt with carefully to get important insights.In case of imbalance in data sets,traditional classifiers have to sacrifice their performances,therefore lead to misclassifications.This paper suggests a weighted nearest neighbor approach in a fuzzy manner to deal with this issue.We have adapted the‘existing algorithm modification solution’to learn from imbalanced datasets that classify data without manipulating the natural distribution of data unlike the other popular data balancing methods.The K nearest neighbor is a non-parametric classification method that is mostly used in machine learning problems.Fuzzy classification with the nearest neighbor clears the belonging of an instance to classes and optimal weights with improved nearest neighbor concept helping to correctly classify imbalanced data.The proposed hybrid approach takes care of imbalance nature of data and reduces the inaccuracies appear in applications of original and traditional classifiers.Results show that it performs well over the existing fuzzy nearest neighbor and weighted neighbor strategies for imbalanced learning.
文摘针对基于接收信号强度的位置指纹室内定位算法定位精度不高的问题,提出了一种均值层次聚类和自适应加权K近邻(weighted K nearest neighbor,WKNN)的室内定位算法。算法首先在设置的参考点上采集蓝牙信号强度构建离线指纹数据库,然后采用均值层次聚类方法将所有参考点根据各自之间的相似度分为n个类,滤除掉相似度较小的参考点,最后根据待定位点和参考点间的信号距离的相似度,计算出距离差的标准差来自适应确定K值,并进行位置估算。实验结果表明,本文提出的算法在定位精度上比WKNN、动态加权K近邻(enhanced weighted K nearest neighbor,EWKNN)方法分别提升了30.0%和18.0%,在定位实时性上比WKNN和EWKNN方法分别提高了19.2%和28.4%。将该算法用于室内物体定位,可以同时提高定位精度和定位实时性。
基金This work was supported by the National Natural Science Foundation of China(Grant No.2017YFC0403605 and No.11601419).
文摘Whale optimization algorithm(WOA)is a new population-based meta-heuristic algorithm.WOA uses shrinking encircling mechanism,spiral rise,and random learning strategies to update whale’s positions.WOA has merit in terms of simple calculation and high computational accuracy,but its convergence speed is slow and it is easy to fall into the local optimal solution.In order to overcome the shortcomings,this paper integrates adaptive neighborhood and hybrid mutation strategies into whale optimization algorithms,designs the average distance from itself to other whales as an adaptive neighborhood radius,and chooses to learn from the optimal solution in the neighborhood instead of random learning strategies.The hybrid mutation strategy is used to enhance the ability of algorithm to jump out of the local optimal solution.A new whale optimization algorithm(HMNWOA)is proposed.The proposed algorithm inherits the global search capability of the original algorithm,enhances the exploitation ability,improves the quality of the population,and thus improves the convergence speed of the algorithm.A feature selection algorithm based on binary HMNWOA is proposed.Twelve standard datasets from UCI repository test the validity of the proposed algorithm for feature selection.The experimental results show that HMNWOA is very competitive compared to the other six popular feature selection methods in improving the classification accuracy and reducing the number of features,and ensures that HMNWOA has strong search ability in the search feature space.
文摘Consider the regression model Y=Xβ+ g(T) + e. Here g is an unknown smoothing function on [0, 1], β is a l-dimensional parameter to be estimated, and e is an unobserved error. When data are randomly censored, the estimators βn* and gn*forβ and g are obtained by using class K and the least square methods. It is shown that βn* is asymptotically normal and gn* achieves the convergent rate O(n-1/3).
文摘Various methods have been used to estimate the amount of above ground forest biomass across landscapes and to create biomass maps for specific stands or pixels across ownership or project areas. Without an accurate estimation method, land managers might end up with incorrect biomass estimate maps, which could lead them to make poorer decisions in their future management plans. The goal of this study was to compare various imputation methods to predict forest biomass and basal area, at a project planning scale (a combination of ground inventory plots, light detection and ranging (LiDAR) data, satellite imagery, and climate data was analyzed, and their root mean square error (RMSE) and bias were calculated. Results indicate that for biomass prediction, the k-nn (k = 5) had the lowest RMSE and least amount of bias. The second most accurate method consisted of the k-nn (k = 3), followed by the GWR model, and the random forest imputation. For basal area prediction, the GWR model had the lowest RMSE and least amount of bias. The second most accurate method was k-nn (k = 5), followed by k-nn (k = 3), and the random forest method. For both metrics, the GNN method was the least accurate based on the ranking of RMSE and bias.