时间序列早期分类(ETSC)有两个矛盾的目标:早期性和准确率。分类早期性的实现,总是以牺牲它的准确率为代价。现有基于优化的多变量时间序列(MTS)早期分类方法,虽然在成本函数中考虑了错误分类成本和延迟决策成本,却忽视了MTS数据集样本...时间序列早期分类(ETSC)有两个矛盾的目标:早期性和准确率。分类早期性的实现,总是以牺牲它的准确率为代价。现有基于优化的多变量时间序列(MTS)早期分类方法,虽然在成本函数中考虑了错误分类成本和延迟决策成本,却忽视了MTS数据集样本之间的局部结构对分类性能的影响。针对这个问题,提出一种基于正交局部保持映射(OLPP)和成本优化的MTS早期分类模型(OLPPMOAE)。首先,使用OLPP将MTS样本前缀映射到低维空间,保持原数据集的局部结构;其次,在低维空间训练一组高斯过程(GP)分类器,生成训练集每个时刻的类概率;最后,使用粒子群优化(PSO)算法从这些类概率中学习停止规则中的最优参数。在6个MTS数据集上的实验结果表明,在早期性基本持平的情况下,OLPPMOAE的准确率显著高于基于成本的R1_C_(lr)(stopping Rule and Cost function with regularization term l_(1)and l_(2))模型,平均准确率能够提升11.33%~15.35%,调和均值(HM)能够提升4.71%~9.01%。因此,所提模型能够以较高的准确率尽早地分类MTS。展开更多
An Extended Kalman Filter(EKF) is commonly used to fuse raw Global Navigation Satellite System(GNSS) measurements and Inertial Navigation System(INS) derived measurements. However, the Conventional EKF(CEKF) s...An Extended Kalman Filter(EKF) is commonly used to fuse raw Global Navigation Satellite System(GNSS) measurements and Inertial Navigation System(INS) derived measurements. However, the Conventional EKF(CEKF) suffers the problem for which the uncertainty of the statistical properties to dynamic and measurement models will degrade the performance.In this research, an Adaptive Interacting Multiple Model(AIMM) filter is developed to enhance performance. The soft-switching property of Interacting Multiple Model(IMM) algorithm allows the adaptation between two levels of process noise, namely lower and upper bounds of the process noise. In particular, the Sage adaptive filtering is applied to adapt the measurement covariance on line. In addition, a classified measurement update strategy is utilized, which updates the pseudorange and Doppler observations sequentially. A field experiment was conducted to validate the proposed algorithm, the pseudorange and Doppler observations from Global Positioning System(GPS) and Bei Dou Navigation Satellite System(BDS) were post-processed in differential mode.The results indicate that decimeter-level positioning accuracy is achievable with AIMM for GPS/INS and GPS/BDS/INS configurations, and the position accuracy is improved by 35.8%, 34.3% and 33.9% for north, east and height components, respectively, compared to the CEKF counterpartfor GPS/BDS/INS. Degraded performance for BDS/INS is obtained due to the lower precision of BDS pseudorange observations.展开更多
Many data mining applications have a large amount of data but labeling data is usually difficult, expensive, or time consuming, as it requires human experts for annotation. Semi-supervised learning addresses this prob...Many data mining applications have a large amount of data but labeling data is usually difficult, expensive, or time consuming, as it requires human experts for annotation. Semi-supervised learning addresses this problem by using unlabeled data together with labeled data in the training process. Co-Training is a popular semi-supervised learning algorithm that has the assumptions that each example is represented by multiple sets of features (views) and these views are sufficient for learning and independent given the class. However, these assumptions axe strong and are not satisfied in many real-world domains. In this paper, a single-view variant of Co-Training, called Co-Training by Committee (CoBC) is proposed, in which an ensemble of diverse classifiers is used instead of redundant and independent views. We introduce a new labeling confidence measure for unlabeled examples based on estimating the local accuracy of the committee members on its neighborhood. Then we introduce two new learning algorithms, QBC-then-CoBC and QBC-with-CoBC, which combine the merits of committee-based semi-supervised learning and active learning. The random subspace method is applied on both C4.5 decision trees and 1-nearest neighbor classifiers to construct the diverse ensembles used for semi-supervised learning and active learning. Experiments show that these two combinations can outperform other non committee-based ones.展开更多
The rapid development of the Internet brings a variety of original information including text information, audio information, etc. However, it is difficult to find the most useful knowledge rapidly and accurately beca...The rapid development of the Internet brings a variety of original information including text information, audio information, etc. However, it is difficult to find the most useful knowledge rapidly and accurately because of its huge number. Automatic text classification technology based on machine learning can classify a large number of natural language documents into the corresponding subject categories according to its correct semantics. It is helpful to grasp the text information directly. By learning from a set of hand-labeled documents,we obtain the traditional supervised classifier for text categorization(TC). However, labeling all data by human is labor intensive and time consuming. To solve this problem, some scholars proposed a semi-supervised learning method to train classifier, but it is unfeasible for various kinds and great number of Web data since it still needs a part of hand-labeled data. In 2012, Li et al. invented a fully automatic categorization approach for text(FACT)based on supervised learning, where no manual labeling efforts are required. But automatically labeling all data can bring noise into experiment and cause the fact that the result cannot meet the accuracy requirement. We put forward a new idea that part of data with high accuracy can be automatically tagged based on the semantic of category name, then a semi-supervised way is taken to train classifier with both labeled and unlabeled data,and ultimately a precise classification of massive text data can be achieved. The empirical experiments show that the method outperforms the supervised support vector machine(SVM) in terms of both F1 performance and classification accuracy in most cases. It proves the effectiveness of the semi-supervised algorithm in automatic TC.展开更多
文摘时间序列早期分类(ETSC)有两个矛盾的目标:早期性和准确率。分类早期性的实现,总是以牺牲它的准确率为代价。现有基于优化的多变量时间序列(MTS)早期分类方法,虽然在成本函数中考虑了错误分类成本和延迟决策成本,却忽视了MTS数据集样本之间的局部结构对分类性能的影响。针对这个问题,提出一种基于正交局部保持映射(OLPP)和成本优化的MTS早期分类模型(OLPPMOAE)。首先,使用OLPP将MTS样本前缀映射到低维空间,保持原数据集的局部结构;其次,在低维空间训练一组高斯过程(GP)分类器,生成训练集每个时刻的类概率;最后,使用粒子群优化(PSO)算法从这些类概率中学习停止规则中的最优参数。在6个MTS数据集上的实验结果表明,在早期性基本持平的情况下,OLPPMOAE的准确率显著高于基于成本的R1_C_(lr)(stopping Rule and Cost function with regularization term l_(1)and l_(2))模型,平均准确率能够提升11.33%~15.35%,调和均值(HM)能够提升4.71%~9.01%。因此,所提模型能够以较高的准确率尽早地分类MTS。
基金co-supported by the National Key Research and Development Program of China(No.2016YFC0803103)Beijing Advanced Innovation Center for Future Urban Design(No.UDC2016050100)Beijing Postdoctoral Research Foundation
文摘An Extended Kalman Filter(EKF) is commonly used to fuse raw Global Navigation Satellite System(GNSS) measurements and Inertial Navigation System(INS) derived measurements. However, the Conventional EKF(CEKF) suffers the problem for which the uncertainty of the statistical properties to dynamic and measurement models will degrade the performance.In this research, an Adaptive Interacting Multiple Model(AIMM) filter is developed to enhance performance. The soft-switching property of Interacting Multiple Model(IMM) algorithm allows the adaptation between two levels of process noise, namely lower and upper bounds of the process noise. In particular, the Sage adaptive filtering is applied to adapt the measurement covariance on line. In addition, a classified measurement update strategy is utilized, which updates the pseudorange and Doppler observations sequentially. A field experiment was conducted to validate the proposed algorithm, the pseudorange and Doppler observations from Global Positioning System(GPS) and Bei Dou Navigation Satellite System(BDS) were post-processed in differential mode.The results indicate that decimeter-level positioning accuracy is achievable with AIMM for GPS/INS and GPS/BDS/INS configurations, and the position accuracy is improved by 35.8%, 34.3% and 33.9% for north, east and height components, respectively, compared to the CEKF counterpartfor GPS/BDS/INS. Degraded performance for BDS/INS is obtained due to the lower precision of BDS pseudorange observations.
基金partially supported by the Transregional Collaborative Research Centre SFB/TRR 62 Companion-Technology for Cognitive Technical Systems funded by the German Research Foundation(DFG)supported by a scholarship of the German Academic Exchange Service(DAAD)
文摘Many data mining applications have a large amount of data but labeling data is usually difficult, expensive, or time consuming, as it requires human experts for annotation. Semi-supervised learning addresses this problem by using unlabeled data together with labeled data in the training process. Co-Training is a popular semi-supervised learning algorithm that has the assumptions that each example is represented by multiple sets of features (views) and these views are sufficient for learning and independent given the class. However, these assumptions axe strong and are not satisfied in many real-world domains. In this paper, a single-view variant of Co-Training, called Co-Training by Committee (CoBC) is proposed, in which an ensemble of diverse classifiers is used instead of redundant and independent views. We introduce a new labeling confidence measure for unlabeled examples based on estimating the local accuracy of the committee members on its neighborhood. Then we introduce two new learning algorithms, QBC-then-CoBC and QBC-with-CoBC, which combine the merits of committee-based semi-supervised learning and active learning. The random subspace method is applied on both C4.5 decision trees and 1-nearest neighbor classifiers to construct the diverse ensembles used for semi-supervised learning and active learning. Experiments show that these two combinations can outperform other non committee-based ones.
基金the National Key Technology Research and Development Program of China(No.2015BAH13F01)the Beijing Natural Science Foundation(No.4152007)
文摘The rapid development of the Internet brings a variety of original information including text information, audio information, etc. However, it is difficult to find the most useful knowledge rapidly and accurately because of its huge number. Automatic text classification technology based on machine learning can classify a large number of natural language documents into the corresponding subject categories according to its correct semantics. It is helpful to grasp the text information directly. By learning from a set of hand-labeled documents,we obtain the traditional supervised classifier for text categorization(TC). However, labeling all data by human is labor intensive and time consuming. To solve this problem, some scholars proposed a semi-supervised learning method to train classifier, but it is unfeasible for various kinds and great number of Web data since it still needs a part of hand-labeled data. In 2012, Li et al. invented a fully automatic categorization approach for text(FACT)based on supervised learning, where no manual labeling efforts are required. But automatically labeling all data can bring noise into experiment and cause the fact that the result cannot meet the accuracy requirement. We put forward a new idea that part of data with high accuracy can be automatically tagged based on the semantic of category name, then a semi-supervised way is taken to train classifier with both labeled and unlabeled data,and ultimately a precise classification of massive text data can be achieved. The empirical experiments show that the method outperforms the supervised support vector machine(SVM) in terms of both F1 performance and classification accuracy in most cases. It proves the effectiveness of the semi-supervised algorithm in automatic TC.