传统的钓鱼网站检测技术主要采用随机或者凭经验选取敏感特征项用于检测的方法,无法保证检测的准确性。为此,提出一种面向钓鱼网站敏感特征选取的改进的信息增益算法IIGAIN(Improved Information Gain Algorithm)。该算法综合考虑了特...传统的钓鱼网站检测技术主要采用随机或者凭经验选取敏感特征项用于检测的方法,无法保证检测的准确性。为此,提出一种面向钓鱼网站敏感特征选取的改进的信息增益算法IIGAIN(Improved Information Gain Algorithm)。该算法综合考虑了特征项的类内离散度,通过对特征项的类内离散度差值做相应的处理,以处理后的结果作为惩罚项改进信息增益算法。实验结果表明,利用IIGAIN进行特征项选取的钓鱼网站检测方法的检测准确性明显优于随机选取特征项的钓鱼网站检测方法。展开更多
A search strategy based on the maximal information gain principle is presented for the cued search of phased array radars. First, the method for the determination of the cued search region, arrangement of beam positio...A search strategy based on the maximal information gain principle is presented for the cued search of phased array radars. First, the method for the determination of the cued search region, arrangement of beam positions, and the calculation of the prior probability distribution of each beam position is discussed. And then, two search algorithms based on information gain are proposed using Shannon entropy and Kullback-Leibler entropy, respectively. With the proposed strategy, the information gain of each beam position is predicted before the radar detection, and the observation is made in the beam position with the maximal information gain. Compared with the conventional method of sequential search and confirm search, simulation results show that the proposed search strategy can distinctly improve the search performance and save radar time resources with the same given detection probability.展开更多
This paper addresses the problem of sensor search scheduling in the complicated space environment faced by the low-earth orbit constellation.Several search scheduling methods based on the commonly used information gai...This paper addresses the problem of sensor search scheduling in the complicated space environment faced by the low-earth orbit constellation.Several search scheduling methods based on the commonly used information gain are compared via simulations first.Then a novel search scheduling method in the scenarios of uncertainty observation is proposed based on the global Shannon information gain and beta density based uncertainty model.Simulation results indicate that the beta density model serves a good option for solving the problem of target acquisition in the complicated space environments.展开更多
Multi-sensor system is becoming increasingly important in a variety of military and civilian applications. In general, single sensor system can only provide partial information about environment while multi-sensor sys...Multi-sensor system is becoming increasingly important in a variety of military and civilian applications. In general, single sensor system can only provide partial information about environment while multi-sensor system provides a synergistic effect, which improves the quality and availability of information. Data fusion techniques can effectively combine this environmental information from similar and/or dissimilar sensors. Sensor management, aiming at improving data fusion performance by controlling sensor behavior, plays an important role in a data fusion process. This paper presents a method using fisher information gain based sensor effectiveness metric for sensor assignment in multi-sensor and multi-target tracking applications. The fisher information gain is computed for every sensor-target pairing on each scan. The advantage for this metric over other ones is that the fisher information gain for the target obtained by multi-sensors is equal to the sum of ones obtained by the individual sensor, so standard transportation problem formulation can be used to solve this problem without importing the concept of pseudo sensor. The simulation results show the effectiveness of the method.展开更多
Considering two seismic parameters,energy and the frequency of an earthquake as a whole from the definition of information gain in entropy,we study the information gain of M≥6.0 earthquakes from the world earthquake ...Considering two seismic parameters,energy and the frequency of an earthquake as a whole from the definition of information gain in entropy,we study the information gain of M≥6.0 earthquakes from the world earthquake catalogue during 1900-1992.The results show that the information gain decreases before strong earthquakes.Our study of the recent seismic tendency of large earthquakes shows that the probability of earthquakes with M≥8.5 is low for the near future around the world.The information gain technique provides a new approach to tracing and predicting earthquakes from the data of moderate and small earthquakes.展开更多
Sentiment analysis is the process of determining the intention or emotion behind an article.The subjective information from the context is analyzed by the sentimental analysis of the people’s opinion.The data that is...Sentiment analysis is the process of determining the intention or emotion behind an article.The subjective information from the context is analyzed by the sentimental analysis of the people’s opinion.The data that is analyzed quantifies the reactions or sentiments and reveals the information’s contextual polarity.In social behavior,sentiment can be thought of as a latent variable.Measuring and comprehending this behavior could help us to better understand the social issues.Because sentiments are domain specific,sentimental analysis in a specific context is critical in any real-world scenario.Textual sentiment analysis is done in sentence,document level and feature levels.This work introduces a new Information Gain based Feature Selection(IGbFS)algorithm for selecting highly correlated features eliminating irrelevant and redundant ones.Extensive textual sentiment analysis on sentence,document and feature levels are performed by exploiting the proposed Information Gain based Feature Selection algorithm.The analysis is done based on the datasets from Cornell and Kaggle repositories.When compared to existing baseline classifiers,the suggested Information Gain based classifier resulted in an increased accuracy of 96%for document,97.4%for sentence and 98.5%for feature levels respectively.Also,the proposed method is tested with IMDB,Yelp 2013 and Yelp 2014 datasets.Experimental results for these high dimensional datasets give increased accuracy of 95%,96%and 98%for the proposed Information Gain based classifier for document,sentence and feature levels respectively compared to existing baseline classifiers.展开更多
We advance here a novel methodology for robust intelligent biometric information management with inferences and predictions made using randomness and complexity concepts. Intelligence refers to learning, adap- tation,...We advance here a novel methodology for robust intelligent biometric information management with inferences and predictions made using randomness and complexity concepts. Intelligence refers to learning, adap- tation, and functionality, and robustness refers to the ability to handle incomplete and/or corrupt adversarial information, on one side, and image and or device variability, on the other side. The proposed methodology is model-free and non-parametric. It draws support from discriminative methods using likelihood ratios to link at the conceptual level biometrics and forensics. It further links, at the modeling and implementation level, the Bayesian framework, statistical learning theory (SLT) using transduction and semi-supervised lea- rning, and Information Theory (IY) using mutual information. The key concepts supporting the proposed methodology are a) local estimation to facilitate learning and prediction using both labeled and unlabeled data;b) similarity metrics using regularity of patterns, randomness deficiency, and Kolmogorov complexity (similar to MDL) using strangeness/typicality and ranking p-values;and c) the Cover – Hart theorem on the asymptotical performance of k-nearest neighbors approaching the optimal Bayes error. Several topics on biometric inference and prediction related to 1) multi-level and multi-layer data fusion including quality and multi-modal biometrics;2) score normalization and revision theory;3) face selection and tracking;and 4) identity management, are described here using an integrated approach that includes transduction and boosting for ranking and sequential fusion/aggregation, respectively, on one side, and active learning and change/ outlier/intrusion detection realized using information gain and martingale, respectively, on the other side. The methodology proposed can be mapped to additional types of information beyond biometrics.展开更多
The trend toward designing an intelligent distribution system based on students’individual differences and individual needs has taken precedence in view of the traditional dormitory distribution system,which neglects...The trend toward designing an intelligent distribution system based on students’individual differences and individual needs has taken precedence in view of the traditional dormitory distribution system,which neglects the students’personality traits,causes dormitory disputes,and affects the students’quality of life and academic quality.This paper collects freshmen's data according to college students’personal preferences,conducts a classification comparison,uses the decision tree classification algorithm based on the information gain principle as the core algorithm of dormitory allocation,determines the description rules of students’personal preferences and decision tree classification preferences,completes the conceptual design of the database of entity relations and data dictionaries,meets students’personality classification requirements for the dormitory,and lays the foundation for the intelligent dormitory allocation system.展开更多
针对现有的DDoS(distributed denial of service)攻击检测模型面临大量数据时,呈现出检测效率低的问题。为适应当前网络环境,通过研究DDoS攻击检测模型、提取流量特征、计算攻击密度,提出一种基于融合稀疏注意力机制的DDoS攻击检测模型G...针对现有的DDoS(distributed denial of service)攻击检测模型面临大量数据时,呈现出检测效率低的问题。为适应当前网络环境,通过研究DDoS攻击检测模型、提取流量特征、计算攻击密度,提出一种基于融合稀疏注意力机制的DDoS攻击检测模型GVBNet(global variable block net),使用攻击密度自适应计算稀疏注意力。利用信息熵以及信息增益分析提取攻击流量的连续字节作为特征向量,通过构建基于GVBNet的网络模型在两种数据集上进行训练。实验结果表明,该方法具有良好的识别效果、检测速度以及抗干扰能力,在不同的环境下具有应用价值。展开更多
In order to enhance the accuracy of Air Traffic Control(ATC)cybersecurity attack detection,in this paper,a new clustering detection method is designed for air traffic control network security attacks.The feature set f...In order to enhance the accuracy of Air Traffic Control(ATC)cybersecurity attack detection,in this paper,a new clustering detection method is designed for air traffic control network security attacks.The feature set for ATC cybersecurity attacks is constructed by setting the feature states,adding recursive features,and determining the feature criticality.The expected information gain and entropy of the feature data are computed to determine the information gain of the feature data and reduce the interference of similar feature data.An autoencoder is introduced into the AI(artificial intelligence)algorithm to encode and decode the characteristics of ATC network security attack behavior to reduce the dimensionality of the ATC network security attack behavior data.Based on the above processing,an unsupervised learning algorithm for clustering detection of ATC network security attacks is designed.First,determine the distance between the clustering clusters of ATC network security attack behavior characteristics,calculate the clustering threshold,and construct the initial clustering center.Then,the new average value of all feature objects in each cluster is recalculated as the new cluster center.Second,it traverses all objects in a cluster of ATC network security attack behavior feature data.Finally,the cluster detection of ATC network security attack behavior is completed by the computation of objective functions.The experiment took three groups of experimental attack behavior data sets as the test object,and took the detection rate,false detection rate and recall rate as the test indicators,and selected three similar methods for comparative test.The experimental results show that the detection rate of this method is about 98%,the false positive rate is below 1%,and the recall rate is above 97%.Research shows that this method can improve the detection performance of security attacks in air traffic control network.展开更多
文摘传统的钓鱼网站检测技术主要采用随机或者凭经验选取敏感特征项用于检测的方法,无法保证检测的准确性。为此,提出一种面向钓鱼网站敏感特征选取的改进的信息增益算法IIGAIN(Improved Information Gain Algorithm)。该算法综合考虑了特征项的类内离散度,通过对特征项的类内离散度差值做相应的处理,以处理后的结果作为惩罚项改进信息增益算法。实验结果表明,利用IIGAIN进行特征项选取的钓鱼网站检测方法的检测准确性明显优于随机选取特征项的钓鱼网站检测方法。
基金the High Technology Research and Development Programme of China (2003AA134030)
文摘A search strategy based on the maximal information gain principle is presented for the cued search of phased array radars. First, the method for the determination of the cued search region, arrangement of beam positions, and the calculation of the prior probability distribution of each beam position is discussed. And then, two search algorithms based on information gain are proposed using Shannon entropy and Kullback-Leibler entropy, respectively. With the proposed strategy, the information gain of each beam position is predicted before the radar detection, and the observation is made in the beam position with the maximal information gain. Compared with the conventional method of sequential search and confirm search, simulation results show that the proposed search strategy can distinctly improve the search performance and save radar time resources with the same given detection probability.
基金supported by the National Defense Pre-research Foundation (9140A21041110KG0148)
文摘This paper addresses the problem of sensor search scheduling in the complicated space environment faced by the low-earth orbit constellation.Several search scheduling methods based on the commonly used information gain are compared via simulations first.Then a novel search scheduling method in the scenarios of uncertainty observation is proposed based on the global Shannon information gain and beta density based uncertainty model.Simulation results indicate that the beta density model serves a good option for solving the problem of target acquisition in the complicated space environments.
文摘Multi-sensor system is becoming increasingly important in a variety of military and civilian applications. In general, single sensor system can only provide partial information about environment while multi-sensor system provides a synergistic effect, which improves the quality and availability of information. Data fusion techniques can effectively combine this environmental information from similar and/or dissimilar sensors. Sensor management, aiming at improving data fusion performance by controlling sensor behavior, plays an important role in a data fusion process. This paper presents a method using fisher information gain based sensor effectiveness metric for sensor assignment in multi-sensor and multi-target tracking applications. The fisher information gain is computed for every sensor-target pairing on each scan. The advantage for this metric over other ones is that the fisher information gain for the target obtained by multi-sensors is equal to the sum of ones obtained by the individual sensor, so standard transportation problem formulation can be used to solve this problem without importing the concept of pseudo sensor. The simulation results show the effectiveness of the method.
文摘Considering two seismic parameters,energy and the frequency of an earthquake as a whole from the definition of information gain in entropy,we study the information gain of M≥6.0 earthquakes from the world earthquake catalogue during 1900-1992.The results show that the information gain decreases before strong earthquakes.Our study of the recent seismic tendency of large earthquakes shows that the probability of earthquakes with M≥8.5 is low for the near future around the world.The information gain technique provides a new approach to tracing and predicting earthquakes from the data of moderate and small earthquakes.
文摘Sentiment analysis is the process of determining the intention or emotion behind an article.The subjective information from the context is analyzed by the sentimental analysis of the people’s opinion.The data that is analyzed quantifies the reactions or sentiments and reveals the information’s contextual polarity.In social behavior,sentiment can be thought of as a latent variable.Measuring and comprehending this behavior could help us to better understand the social issues.Because sentiments are domain specific,sentimental analysis in a specific context is critical in any real-world scenario.Textual sentiment analysis is done in sentence,document level and feature levels.This work introduces a new Information Gain based Feature Selection(IGbFS)algorithm for selecting highly correlated features eliminating irrelevant and redundant ones.Extensive textual sentiment analysis on sentence,document and feature levels are performed by exploiting the proposed Information Gain based Feature Selection algorithm.The analysis is done based on the datasets from Cornell and Kaggle repositories.When compared to existing baseline classifiers,the suggested Information Gain based classifier resulted in an increased accuracy of 96%for document,97.4%for sentence and 98.5%for feature levels respectively.Also,the proposed method is tested with IMDB,Yelp 2013 and Yelp 2014 datasets.Experimental results for these high dimensional datasets give increased accuracy of 95%,96%and 98%for the proposed Information Gain based classifier for document,sentence and feature levels respectively compared to existing baseline classifiers.
文摘We advance here a novel methodology for robust intelligent biometric information management with inferences and predictions made using randomness and complexity concepts. Intelligence refers to learning, adap- tation, and functionality, and robustness refers to the ability to handle incomplete and/or corrupt adversarial information, on one side, and image and or device variability, on the other side. The proposed methodology is model-free and non-parametric. It draws support from discriminative methods using likelihood ratios to link at the conceptual level biometrics and forensics. It further links, at the modeling and implementation level, the Bayesian framework, statistical learning theory (SLT) using transduction and semi-supervised lea- rning, and Information Theory (IY) using mutual information. The key concepts supporting the proposed methodology are a) local estimation to facilitate learning and prediction using both labeled and unlabeled data;b) similarity metrics using regularity of patterns, randomness deficiency, and Kolmogorov complexity (similar to MDL) using strangeness/typicality and ranking p-values;and c) the Cover – Hart theorem on the asymptotical performance of k-nearest neighbors approaching the optimal Bayes error. Several topics on biometric inference and prediction related to 1) multi-level and multi-layer data fusion including quality and multi-modal biometrics;2) score normalization and revision theory;3) face selection and tracking;and 4) identity management, are described here using an integrated approach that includes transduction and boosting for ranking and sequential fusion/aggregation, respectively, on one side, and active learning and change/ outlier/intrusion detection realized using information gain and martingale, respectively, on the other side. The methodology proposed can be mapped to additional types of information beyond biometrics.
文摘The trend toward designing an intelligent distribution system based on students’individual differences and individual needs has taken precedence in view of the traditional dormitory distribution system,which neglects the students’personality traits,causes dormitory disputes,and affects the students’quality of life and academic quality.This paper collects freshmen's data according to college students’personal preferences,conducts a classification comparison,uses the decision tree classification algorithm based on the information gain principle as the core algorithm of dormitory allocation,determines the description rules of students’personal preferences and decision tree classification preferences,completes the conceptual design of the database of entity relations and data dictionaries,meets students’personality classification requirements for the dormitory,and lays the foundation for the intelligent dormitory allocation system.
基金National Natural Science Foundation of China(U2133208,U20A20161)National Natural Science Foundation of China(No.62273244)Sichuan Science and Technology Program(No.2022YFG0180).
文摘In order to enhance the accuracy of Air Traffic Control(ATC)cybersecurity attack detection,in this paper,a new clustering detection method is designed for air traffic control network security attacks.The feature set for ATC cybersecurity attacks is constructed by setting the feature states,adding recursive features,and determining the feature criticality.The expected information gain and entropy of the feature data are computed to determine the information gain of the feature data and reduce the interference of similar feature data.An autoencoder is introduced into the AI(artificial intelligence)algorithm to encode and decode the characteristics of ATC network security attack behavior to reduce the dimensionality of the ATC network security attack behavior data.Based on the above processing,an unsupervised learning algorithm for clustering detection of ATC network security attacks is designed.First,determine the distance between the clustering clusters of ATC network security attack behavior characteristics,calculate the clustering threshold,and construct the initial clustering center.Then,the new average value of all feature objects in each cluster is recalculated as the new cluster center.Second,it traverses all objects in a cluster of ATC network security attack behavior feature data.Finally,the cluster detection of ATC network security attack behavior is completed by the computation of objective functions.The experiment took three groups of experimental attack behavior data sets as the test object,and took the detection rate,false detection rate and recall rate as the test indicators,and selected three similar methods for comparative test.The experimental results show that the detection rate of this method is about 98%,the false positive rate is below 1%,and the recall rate is above 97%.Research shows that this method can improve the detection performance of security attacks in air traffic control network.