A search strategy based on the maximal information gain principle is presented for the cued search of phased array radars. First, the method for the determination of the cued search region, arrangement of beam positio...A search strategy based on the maximal information gain principle is presented for the cued search of phased array radars. First, the method for the determination of the cued search region, arrangement of beam positions, and the calculation of the prior probability distribution of each beam position is discussed. And then, two search algorithms based on information gain are proposed using Shannon entropy and Kullback-Leibler entropy, respectively. With the proposed strategy, the information gain of each beam position is predicted before the radar detection, and the observation is made in the beam position with the maximal information gain. Compared with the conventional method of sequential search and confirm search, simulation results show that the proposed search strategy can distinctly improve the search performance and save radar time resources with the same given detection probability.展开更多
This paper addresses the problem of sensor search scheduling in the complicated space environment faced by the low-earth orbit constellation.Several search scheduling methods based on the commonly used information gai...This paper addresses the problem of sensor search scheduling in the complicated space environment faced by the low-earth orbit constellation.Several search scheduling methods based on the commonly used information gain are compared via simulations first.Then a novel search scheduling method in the scenarios of uncertainty observation is proposed based on the global Shannon information gain and beta density based uncertainty model.Simulation results indicate that the beta density model serves a good option for solving the problem of target acquisition in the complicated space environments.展开更多
Multi-sensor system is becoming increasingly important in a variety of military and civilian applications. In general, single sensor system can only provide partial information about environment while multi-sensor sys...Multi-sensor system is becoming increasingly important in a variety of military and civilian applications. In general, single sensor system can only provide partial information about environment while multi-sensor system provides a synergistic effect, which improves the quality and availability of information. Data fusion techniques can effectively combine this environmental information from similar and/or dissimilar sensors. Sensor management, aiming at improving data fusion performance by controlling sensor behavior, plays an important role in a data fusion process. This paper presents a method using fisher information gain based sensor effectiveness metric for sensor assignment in multi-sensor and multi-target tracking applications. The fisher information gain is computed for every sensor-target pairing on each scan. The advantage for this metric over other ones is that the fisher information gain for the target obtained by multi-sensors is equal to the sum of ones obtained by the individual sensor, so standard transportation problem formulation can be used to solve this problem without importing the concept of pseudo sensor. The simulation results show the effectiveness of the method.展开更多
Considering two seismic parameters,energy and the frequency of an earthquake as a whole from the definition of information gain in entropy,we study the information gain of M≥6.0 earthquakes from the world earthquake ...Considering two seismic parameters,energy and the frequency of an earthquake as a whole from the definition of information gain in entropy,we study the information gain of M≥6.0 earthquakes from the world earthquake catalogue during 1900-1992.The results show that the information gain decreases before strong earthquakes.Our study of the recent seismic tendency of large earthquakes shows that the probability of earthquakes with M≥8.5 is low for the near future around the world.The information gain technique provides a new approach to tracing and predicting earthquakes from the data of moderate and small earthquakes.展开更多
Sentiment analysis is the process of determining the intention or emotion behind an article.The subjective information from the context is analyzed by the sentimental analysis of the people’s opinion.The data that is...Sentiment analysis is the process of determining the intention or emotion behind an article.The subjective information from the context is analyzed by the sentimental analysis of the people’s opinion.The data that is analyzed quantifies the reactions or sentiments and reveals the information’s contextual polarity.In social behavior,sentiment can be thought of as a latent variable.Measuring and comprehending this behavior could help us to better understand the social issues.Because sentiments are domain specific,sentimental analysis in a specific context is critical in any real-world scenario.Textual sentiment analysis is done in sentence,document level and feature levels.This work introduces a new Information Gain based Feature Selection(IGbFS)algorithm for selecting highly correlated features eliminating irrelevant and redundant ones.Extensive textual sentiment analysis on sentence,document and feature levels are performed by exploiting the proposed Information Gain based Feature Selection algorithm.The analysis is done based on the datasets from Cornell and Kaggle repositories.When compared to existing baseline classifiers,the suggested Information Gain based classifier resulted in an increased accuracy of 96%for document,97.4%for sentence and 98.5%for feature levels respectively.Also,the proposed method is tested with IMDB,Yelp 2013 and Yelp 2014 datasets.Experimental results for these high dimensional datasets give increased accuracy of 95%,96%and 98%for the proposed Information Gain based classifier for document,sentence and feature levels respectively compared to existing baseline classifiers.展开更多
We advance here a novel methodology for robust intelligent biometric information management with inferences and predictions made using randomness and complexity concepts. Intelligence refers to learning, adap- tation,...We advance here a novel methodology for robust intelligent biometric information management with inferences and predictions made using randomness and complexity concepts. Intelligence refers to learning, adap- tation, and functionality, and robustness refers to the ability to handle incomplete and/or corrupt adversarial information, on one side, and image and or device variability, on the other side. The proposed methodology is model-free and non-parametric. It draws support from discriminative methods using likelihood ratios to link at the conceptual level biometrics and forensics. It further links, at the modeling and implementation level, the Bayesian framework, statistical learning theory (SLT) using transduction and semi-supervised lea- rning, and Information Theory (IY) using mutual information. The key concepts supporting the proposed methodology are a) local estimation to facilitate learning and prediction using both labeled and unlabeled data;b) similarity metrics using regularity of patterns, randomness deficiency, and Kolmogorov complexity (similar to MDL) using strangeness/typicality and ranking p-values;and c) the Cover – Hart theorem on the asymptotical performance of k-nearest neighbors approaching the optimal Bayes error. Several topics on biometric inference and prediction related to 1) multi-level and multi-layer data fusion including quality and multi-modal biometrics;2) score normalization and revision theory;3) face selection and tracking;and 4) identity management, are described here using an integrated approach that includes transduction and boosting for ranking and sequential fusion/aggregation, respectively, on one side, and active learning and change/ outlier/intrusion detection realized using information gain and martingale, respectively, on the other side. The methodology proposed can be mapped to additional types of information beyond biometrics.展开更多
传统的钓鱼网站检测技术主要采用随机或者凭经验选取敏感特征项用于检测的方法,无法保证检测的准确性。为此,提出一种面向钓鱼网站敏感特征选取的改进的信息增益算法IIGAIN(Improved Information Gain Algorithm)。该算法综合考虑了特...传统的钓鱼网站检测技术主要采用随机或者凭经验选取敏感特征项用于检测的方法,无法保证检测的准确性。为此,提出一种面向钓鱼网站敏感特征选取的改进的信息增益算法IIGAIN(Improved Information Gain Algorithm)。该算法综合考虑了特征项的类内离散度,通过对特征项的类内离散度差值做相应的处理,以处理后的结果作为惩罚项改进信息增益算法。实验结果表明,利用IIGAIN进行特征项选取的钓鱼网站检测方法的检测准确性明显优于随机选取特征项的钓鱼网站检测方法。展开更多
Information theory is used to obtain the information gain for each identification feature, and this gain is used as the weight factor for this feature to stress the role of effective feature, and the ART model based o...Information theory is used to obtain the information gain for each identification feature, and this gain is used as the weight factor for this feature to stress the role of effective feature, and the ART model based on artificial neural network theory is then used for identification thereby forming the detection system for poor insulators. Exper iments and calculations show this approach is correct and feasible.展开更多
The naïve Bayes classifier is one of the commonly used data mining methods for classification.Despite its simplicity,naïve Bayes is effective and computationally efficient.Although the strong attribute indep...The naïve Bayes classifier is one of the commonly used data mining methods for classification.Despite its simplicity,naïve Bayes is effective and computationally efficient.Although the strong attribute independence assumption in the naïve Bayes classifier makes it a tractable method for learning,this assumption may not hold in real-world applications.Many enhancements to the basic algorithm have been proposed in order to alleviate the violation of attribute independence assumption.While these methods improve the classification performance,they do not necessarily retain the mathematical structure of the naïve Bayes model and some at the expense of computational time.One approach to reduce the naïvetéof the classifier is to incorporate attribute weights in the conditional probability.In this paper,we proposed a method to incorporate attribute weights to naïve Bayes.To evaluate the performance of our method,we used the public benchmark datasets.We compared our method with the standard naïve Bayes and baseline attribute weighting methods.Experimental results show that our method to incorporate attribute weights improves the classification performance compared to both standard naïve Bayes and baseline attribute weighting methods in terms of classification accuracy and F1,especially when the independence assumption is strongly violated,which was validated using the Chi-square test of independence.展开更多
特征选择是用机器学习方法提高转发预测精度和效率的关键步骤,其前提是特征提取.目前,特征选择中常用的方法有信息增益(Information Gain,IG)、互信息和卡方检验(CHI-square test,CHI)等,传统特征选择方法中出现低频词引起的信息增益和...特征选择是用机器学习方法提高转发预测精度和效率的关键步骤,其前提是特征提取.目前,特征选择中常用的方法有信息增益(Information Gain,IG)、互信息和卡方检验(CHI-square test,CHI)等,传统特征选择方法中出现低频词引起的信息增益和卡方检验的负相关、干扰计算等问题,导致分类准确率不高.本文首先针对低频词引起的信息增益和卡方检验的负相关、干扰计算等问题进行研究,分别引入平衡因子和词频因子来提高算法的准确率;其次,根据微博信息传播的特点,结合改进的IG算法和CHI算法,提出了一种基于BIG-WFCHI(Balance Information Gain-Word Frequency CHI-square test)的特征选择方法.实验分析中,本文采用基于最大熵模型、支持向量机、朴素贝叶斯分类器、KNN和多层感知器5种分类器对两个异构数据集进行了测试.实验结果表明,本文提出的方法能有效消除无关特征和冗余特征,提高分类精度,并减少运算时间.展开更多
基金the High Technology Research and Development Programme of China (2003AA134030)
文摘A search strategy based on the maximal information gain principle is presented for the cued search of phased array radars. First, the method for the determination of the cued search region, arrangement of beam positions, and the calculation of the prior probability distribution of each beam position is discussed. And then, two search algorithms based on information gain are proposed using Shannon entropy and Kullback-Leibler entropy, respectively. With the proposed strategy, the information gain of each beam position is predicted before the radar detection, and the observation is made in the beam position with the maximal information gain. Compared with the conventional method of sequential search and confirm search, simulation results show that the proposed search strategy can distinctly improve the search performance and save radar time resources with the same given detection probability.
基金supported by the National Defense Pre-research Foundation (9140A21041110KG0148)
文摘This paper addresses the problem of sensor search scheduling in the complicated space environment faced by the low-earth orbit constellation.Several search scheduling methods based on the commonly used information gain are compared via simulations first.Then a novel search scheduling method in the scenarios of uncertainty observation is proposed based on the global Shannon information gain and beta density based uncertainty model.Simulation results indicate that the beta density model serves a good option for solving the problem of target acquisition in the complicated space environments.
文摘Multi-sensor system is becoming increasingly important in a variety of military and civilian applications. In general, single sensor system can only provide partial information about environment while multi-sensor system provides a synergistic effect, which improves the quality and availability of information. Data fusion techniques can effectively combine this environmental information from similar and/or dissimilar sensors. Sensor management, aiming at improving data fusion performance by controlling sensor behavior, plays an important role in a data fusion process. This paper presents a method using fisher information gain based sensor effectiveness metric for sensor assignment in multi-sensor and multi-target tracking applications. The fisher information gain is computed for every sensor-target pairing on each scan. The advantage for this metric over other ones is that the fisher information gain for the target obtained by multi-sensors is equal to the sum of ones obtained by the individual sensor, so standard transportation problem formulation can be used to solve this problem without importing the concept of pseudo sensor. The simulation results show the effectiveness of the method.
文摘Considering two seismic parameters,energy and the frequency of an earthquake as a whole from the definition of information gain in entropy,we study the information gain of M≥6.0 earthquakes from the world earthquake catalogue during 1900-1992.The results show that the information gain decreases before strong earthquakes.Our study of the recent seismic tendency of large earthquakes shows that the probability of earthquakes with M≥8.5 is low for the near future around the world.The information gain technique provides a new approach to tracing and predicting earthquakes from the data of moderate and small earthquakes.
文摘Sentiment analysis is the process of determining the intention or emotion behind an article.The subjective information from the context is analyzed by the sentimental analysis of the people’s opinion.The data that is analyzed quantifies the reactions or sentiments and reveals the information’s contextual polarity.In social behavior,sentiment can be thought of as a latent variable.Measuring and comprehending this behavior could help us to better understand the social issues.Because sentiments are domain specific,sentimental analysis in a specific context is critical in any real-world scenario.Textual sentiment analysis is done in sentence,document level and feature levels.This work introduces a new Information Gain based Feature Selection(IGbFS)algorithm for selecting highly correlated features eliminating irrelevant and redundant ones.Extensive textual sentiment analysis on sentence,document and feature levels are performed by exploiting the proposed Information Gain based Feature Selection algorithm.The analysis is done based on the datasets from Cornell and Kaggle repositories.When compared to existing baseline classifiers,the suggested Information Gain based classifier resulted in an increased accuracy of 96%for document,97.4%for sentence and 98.5%for feature levels respectively.Also,the proposed method is tested with IMDB,Yelp 2013 and Yelp 2014 datasets.Experimental results for these high dimensional datasets give increased accuracy of 95%,96%and 98%for the proposed Information Gain based classifier for document,sentence and feature levels respectively compared to existing baseline classifiers.
文摘We advance here a novel methodology for robust intelligent biometric information management with inferences and predictions made using randomness and complexity concepts. Intelligence refers to learning, adap- tation, and functionality, and robustness refers to the ability to handle incomplete and/or corrupt adversarial information, on one side, and image and or device variability, on the other side. The proposed methodology is model-free and non-parametric. It draws support from discriminative methods using likelihood ratios to link at the conceptual level biometrics and forensics. It further links, at the modeling and implementation level, the Bayesian framework, statistical learning theory (SLT) using transduction and semi-supervised lea- rning, and Information Theory (IY) using mutual information. The key concepts supporting the proposed methodology are a) local estimation to facilitate learning and prediction using both labeled and unlabeled data;b) similarity metrics using regularity of patterns, randomness deficiency, and Kolmogorov complexity (similar to MDL) using strangeness/typicality and ranking p-values;and c) the Cover – Hart theorem on the asymptotical performance of k-nearest neighbors approaching the optimal Bayes error. Several topics on biometric inference and prediction related to 1) multi-level and multi-layer data fusion including quality and multi-modal biometrics;2) score normalization and revision theory;3) face selection and tracking;and 4) identity management, are described here using an integrated approach that includes transduction and boosting for ranking and sequential fusion/aggregation, respectively, on one side, and active learning and change/ outlier/intrusion detection realized using information gain and martingale, respectively, on the other side. The methodology proposed can be mapped to additional types of information beyond biometrics.
文摘传统的钓鱼网站检测技术主要采用随机或者凭经验选取敏感特征项用于检测的方法,无法保证检测的准确性。为此,提出一种面向钓鱼网站敏感特征选取的改进的信息增益算法IIGAIN(Improved Information Gain Algorithm)。该算法综合考虑了特征项的类内离散度,通过对特征项的类内离散度差值做相应的处理,以处理后的结果作为惩罚项改进信息增益算法。实验结果表明,利用IIGAIN进行特征项选取的钓鱼网站检测方法的检测准确性明显优于随机选取特征项的钓鱼网站检测方法。
文摘Information theory is used to obtain the information gain for each identification feature, and this gain is used as the weight factor for this feature to stress the role of effective feature, and the ART model based on artificial neural network theory is then used for identification thereby forming the detection system for poor insulators. Exper iments and calculations show this approach is correct and feasible.
文摘The naïve Bayes classifier is one of the commonly used data mining methods for classification.Despite its simplicity,naïve Bayes is effective and computationally efficient.Although the strong attribute independence assumption in the naïve Bayes classifier makes it a tractable method for learning,this assumption may not hold in real-world applications.Many enhancements to the basic algorithm have been proposed in order to alleviate the violation of attribute independence assumption.While these methods improve the classification performance,they do not necessarily retain the mathematical structure of the naïve Bayes model and some at the expense of computational time.One approach to reduce the naïvetéof the classifier is to incorporate attribute weights in the conditional probability.In this paper,we proposed a method to incorporate attribute weights to naïve Bayes.To evaluate the performance of our method,we used the public benchmark datasets.We compared our method with the standard naïve Bayes and baseline attribute weighting methods.Experimental results show that our method to incorporate attribute weights improves the classification performance compared to both standard naïve Bayes and baseline attribute weighting methods in terms of classification accuracy and F1,especially when the independence assumption is strongly violated,which was validated using the Chi-square test of independence.
文摘特征选择是用机器学习方法提高转发预测精度和效率的关键步骤,其前提是特征提取.目前,特征选择中常用的方法有信息增益(Information Gain,IG)、互信息和卡方检验(CHI-square test,CHI)等,传统特征选择方法中出现低频词引起的信息增益和卡方检验的负相关、干扰计算等问题,导致分类准确率不高.本文首先针对低频词引起的信息增益和卡方检验的负相关、干扰计算等问题进行研究,分别引入平衡因子和词频因子来提高算法的准确率;其次,根据微博信息传播的特点,结合改进的IG算法和CHI算法,提出了一种基于BIG-WFCHI(Balance Information Gain-Word Frequency CHI-square test)的特征选择方法.实验分析中,本文采用基于最大熵模型、支持向量机、朴素贝叶斯分类器、KNN和多层感知器5种分类器对两个异构数据集进行了测试.实验结果表明,本文提出的方法能有效消除无关特征和冗余特征,提高分类精度,并减少运算时间.