期刊文献+
共找到19篇文章
< 1 >
每页显示 20 50 100
Novel cued search strategy based on information gain for phased array radar 被引量:4
1
作者 Lu Jianbin Hu Weidong Xiao Hui Yu Wenxian 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2008年第2期292-297,共6页
A search strategy based on the maximal information gain principle is presented for the cued search of phased array radars. First, the method for the determination of the cued search region, arrangement of beam positio... A search strategy based on the maximal information gain principle is presented for the cued search of phased array radars. First, the method for the determination of the cued search region, arrangement of beam positions, and the calculation of the prior probability distribution of each beam position is discussed. And then, two search algorithms based on information gain are proposed using Shannon entropy and Kullback-Leibler entropy, respectively. With the proposed strategy, the information gain of each beam position is predicted before the radar detection, and the observation is made in the beam position with the maximal information gain. Compared with the conventional method of sequential search and confirm search, simulation results show that the proposed search strategy can distinctly improve the search performance and save radar time resources with the same given detection probability. 展开更多
关键词 phased array radar search strategy cued search beam position information gain.
下载PDF
Information gain based sensor search scheduling for low-earth orbit constellation estimation 被引量:3
2
作者 Bo Wang Jun Li +1 位作者 Wei An Yiyu Zhou 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2011年第6期926-932,共7页
This paper addresses the problem of sensor search scheduling in the complicated space environment faced by the low-earth orbit constellation.Several search scheduling methods based on the commonly used information gai... This paper addresses the problem of sensor search scheduling in the complicated space environment faced by the low-earth orbit constellation.Several search scheduling methods based on the commonly used information gain are compared via simulations first.Then a novel search scheduling method in the scenarios of uncertainty observation is proposed based on the global Shannon information gain and beta density based uncertainty model.Simulation results indicate that the beta density model serves a good option for solving the problem of target acquisition in the complicated space environments. 展开更多
关键词 low-earth orbit constellation sensor network scheduling algorithm information gain acquisition.
下载PDF
Sensor management based on fisher information gain 被引量:2
3
作者 Tian Kangsheng Zhu Guangxi 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2006年第3期531-534,共4页
Multi-sensor system is becoming increasingly important in a variety of military and civilian applications. In general, single sensor system can only provide partial information about environment while multi-sensor sys... Multi-sensor system is becoming increasingly important in a variety of military and civilian applications. In general, single sensor system can only provide partial information about environment while multi-sensor system provides a synergistic effect, which improves the quality and availability of information. Data fusion techniques can effectively combine this environmental information from similar and/or dissimilar sensors. Sensor management, aiming at improving data fusion performance by controlling sensor behavior, plays an important role in a data fusion process. This paper presents a method using fisher information gain based sensor effectiveness metric for sensor assignment in multi-sensor and multi-target tracking applications. The fisher information gain is computed for every sensor-target pairing on each scan. The advantage for this metric over other ones is that the fisher information gain for the target obtained by multi-sensors is equal to the sum of ones obtained by the individual sensor, so standard transportation problem formulation can be used to solve this problem without importing the concept of pseudo sensor. The simulation results show the effectiveness of the method. 展开更多
关键词 data fusion sensor management fisher information gain linear programming.
下载PDF
Application of Information Gain to Estimating the Seismic Tendency 被引量:2
4
作者 Shen Ping,Shen Jing,and Feng GuozhengInstitute of Geophysics,SSB,Beijing 100081,China 《Earthquake Research in China》 1997年第2期44-50,共7页
Considering two seismic parameters,energy and the frequency of an earthquake as a whole from the definition of information gain in entropy,we study the information gain of M≥6.0 earthquakes from the world earthquake ... Considering two seismic parameters,energy and the frequency of an earthquake as a whole from the definition of information gain in entropy,we study the information gain of M≥6.0 earthquakes from the world earthquake catalogue during 1900-1992.The results show that the information gain decreases before strong earthquakes.Our study of the recent seismic tendency of large earthquakes shows that the probability of earthquakes with M≥8.5 is low for the near future around the world.The information gain technique provides a new approach to tracing and predicting earthquakes from the data of moderate and small earthquakes. 展开更多
关键词 Application of information gain to Estimating the Seismic Tendency
下载PDF
Assessment of Sentiment Analysis Using Information Gain Based Feature Selection Approach
5
作者 R.Madhumathi A.Meena Kowshalya R.Shruthi 《Computer Systems Science & Engineering》 SCIE EI 2022年第11期849-860,共12页
Sentiment analysis is the process of determining the intention or emotion behind an article.The subjective information from the context is analyzed by the sentimental analysis of the people’s opinion.The data that is... Sentiment analysis is the process of determining the intention or emotion behind an article.The subjective information from the context is analyzed by the sentimental analysis of the people’s opinion.The data that is analyzed quantifies the reactions or sentiments and reveals the information’s contextual polarity.In social behavior,sentiment can be thought of as a latent variable.Measuring and comprehending this behavior could help us to better understand the social issues.Because sentiments are domain specific,sentimental analysis in a specific context is critical in any real-world scenario.Textual sentiment analysis is done in sentence,document level and feature levels.This work introduces a new Information Gain based Feature Selection(IGbFS)algorithm for selecting highly correlated features eliminating irrelevant and redundant ones.Extensive textual sentiment analysis on sentence,document and feature levels are performed by exploiting the proposed Information Gain based Feature Selection algorithm.The analysis is done based on the datasets from Cornell and Kaggle repositories.When compared to existing baseline classifiers,the suggested Information Gain based classifier resulted in an increased accuracy of 96%for document,97.4%for sentence and 98.5%for feature levels respectively.Also,the proposed method is tested with IMDB,Yelp 2013 and Yelp 2014 datasets.Experimental results for these high dimensional datasets give increased accuracy of 95%,96%and 98%for the proposed Information Gain based classifier for document,sentence and feature levels respectively compared to existing baseline classifiers. 展开更多
关键词 Sentiment analysis sentence level document level feature level information gain
下载PDF
Hybrid Method Based on Information Gain and Support Vector Machine for Gene Selection in Cancer Classi?cation 被引量:4
6
作者 Lingyun Gao Mingquan Ye +1 位作者 Xiaojie Lu Daobin Huang 《Genomics, Proteomics & Bioinformatics》 SCIE CAS CSCD 2017年第6期389-395,共7页
It remains a great challenge to achieve sufficient cancer classification accuracy with the entire set of genes, due to the high dimensions, small sample size, and big noise of gene expression data. We thus proposed a ... It remains a great challenge to achieve sufficient cancer classification accuracy with the entire set of genes, due to the high dimensions, small sample size, and big noise of gene expression data. We thus proposed a hybrid gene selection method, Information Gain-Support Vector Machine (IG-SVM) in this study. IG was initially employed to filter irrelevant and redundant genes. Then, further removal of redundant genes was performed using SVM to eliminate the noise in the datasets more effectively. Finally, the informative genes selected by IG-SVM served as the input for the LIBSVM classifier. Compared to other related algorithms, IG-SVM showed the highest classification accuracy and superior performance as evaluated using five cancer gene expression datasets based on a few selected genes. As an example, IG-SVM achieved a classification accuracy of 90.32% for colon cancer, which is difficult to be accurately classified, only based on three genes including CSRP1, MYLg, and GUCA2B. 展开更多
关键词 Gene selection Cancer classification information gain Support vector machine Small sample size with highdimension
原文传递
A novel business analytics approach and case study-fuzzy associative classifier based on information gain and rule-covering 被引量:2
7
作者 Yue Ma Guoqing Chena Qiang Wei 《Journal of Management Analytics》 EI 2014年第1期1-19,共19页
Associative classification has attracted remarkable research attention for business analytics in recent years due to its merits in accuracy and understandability.It is deemed meaningful to construct an associative cla... Associative classification has attracted remarkable research attention for business analytics in recent years due to its merits in accuracy and understandability.It is deemed meaningful to construct an associative classifier with a compact set of rules(i.e.,compactness),which is easy to understand and use in decision making.This paper presents a novel approach to fuzzy associative classification(namely Gain-based Fuzzy Rule-Covering classification,GFRC),which is a fuzzy extension of an effective classifier GARC.In GFRC,two desirable strategies are introduced to enhance the compactness with accuracy.One strategy is fuzzy partitioning for data discretization to cope with the‘sharp boundary problem’,in that simulated annealing is incorporated based on the information entropy measure;the other strategy is a data-redundancy resolution coupled with the rulecovering treatment.Data experiments show that GFRC had good accuracy,and was significantly advantageous over other classifiers in compactness.Moreover,GFRC is applied to a real-world case for predicting the growth of sellers in an electronic marketplace,illustrating the classification effectiveness with linguistic rules in business decision support. 展开更多
关键词 associative classification information gain fuzzy partitioning simulated annealing rule-covering
原文传递
Cluster DetectionMethod of Endogenous Security Abnormal Attack Behavior in Air Traffic Control Network
8
作者 Ruchun Jia Jianwei Zhang +2 位作者 Yi Lin Yunxiang Han Feike Yang 《Computers, Materials & Continua》 SCIE EI 2024年第5期2523-2546,共24页
In order to enhance the accuracy of Air Traffic Control(ATC)cybersecurity attack detection,in this paper,a new clustering detection method is designed for air traffic control network security attacks.The feature set f... In order to enhance the accuracy of Air Traffic Control(ATC)cybersecurity attack detection,in this paper,a new clustering detection method is designed for air traffic control network security attacks.The feature set for ATC cybersecurity attacks is constructed by setting the feature states,adding recursive features,and determining the feature criticality.The expected information gain and entropy of the feature data are computed to determine the information gain of the feature data and reduce the interference of similar feature data.An autoencoder is introduced into the AI(artificial intelligence)algorithm to encode and decode the characteristics of ATC network security attack behavior to reduce the dimensionality of the ATC network security attack behavior data.Based on the above processing,an unsupervised learning algorithm for clustering detection of ATC network security attacks is designed.First,determine the distance between the clustering clusters of ATC network security attack behavior characteristics,calculate the clustering threshold,and construct the initial clustering center.Then,the new average value of all feature objects in each cluster is recalculated as the new cluster center.Second,it traverses all objects in a cluster of ATC network security attack behavior feature data.Finally,the cluster detection of ATC network security attack behavior is completed by the computation of objective functions.The experiment took three groups of experimental attack behavior data sets as the test object,and took the detection rate,false detection rate and recall rate as the test indicators,and selected three similar methods for comparative test.The experimental results show that the detection rate of this method is about 98%,the false positive rate is below 1%,and the recall rate is above 97%.Research shows that this method can improve the detection performance of security attacks in air traffic control network. 展开更多
关键词 Air traffic control network security attack behavior cluster detection behavioral characteristics information gain cluster threshold automatic encoder
下载PDF
Research on the Intelligent Distribution System of College Dormitory Based on the Decision Tree Classification Algorithm 被引量:1
9
作者 Huiping Han Beida Wang 《Journal of Contemporary Educational Research》 2023年第2期7-14,共8页
The trend toward designing an intelligent distribution system based on students’individual differences and individual needs has taken precedence in view of the traditional dormitory distribution system,which neglects... The trend toward designing an intelligent distribution system based on students’individual differences and individual needs has taken precedence in view of the traditional dormitory distribution system,which neglects the students’personality traits,causes dormitory disputes,and affects the students’quality of life and academic quality.This paper collects freshmen's data according to college students’personal preferences,conducts a classification comparison,uses the decision tree classification algorithm based on the information gain principle as the core algorithm of dormitory allocation,determines the description rules of students’personal preferences and decision tree classification preferences,completes the conceptual design of the database of entity relations and data dictionaries,meets students’personality classification requirements for the dormitory,and lays the foundation for the intelligent dormitory allocation system. 展开更多
关键词 Intelligent allocation Personal preference information gain Decision tree classification INDIVIDUALIZATION
下载PDF
Applying deep learning and benchmark machine learning algorithms for landslide susceptibility modelling in Rorachu river basin of Sikkim Himalaya, India 被引量:4
10
作者 Kanu Mandal Sunil Saha Sujit Mandal 《Geoscience Frontiers》 SCIE CAS CSCD 2021年第5期264-280,共17页
Landslide is considered as one of the most severe threats to human life and property in the hilly areas of the world.The number of landslides and the level of damage across the globe has been increasing over time.Ther... Landslide is considered as one of the most severe threats to human life and property in the hilly areas of the world.The number of landslides and the level of damage across the globe has been increasing over time.Therefore,landslide management is essential to maintain the natural and socio-economic dynamics of the hilly region.Rorachu river basin is one of the most landslide-prone areas of the Sikkim selected for the present study.The prime goal of the study is to prepare landslide susceptibility maps(LSMs)using computer-based advanced machine learning techniques and compare the performance of the models.To properly understand the existing spatial relation with the landslide,twenty factors,including triggering and causative factors,were selected.A deep learning algorithm viz.convolutional neural network model(CNN)and three popular machine learning techniques,i.e.,random forest model(RF),artificial neural network model(ANN),and bagging model,were employed to prepare the LSMs.Two separate datasets including training and validation were designed by randomly taken landslide and nonlandslide points.A ratio of 70:30 was considered for the selection of both training and validation points.Multicollinearity was assessed by tolerance and variance inflation factor,and the role of individual conditioning factors was estimated using information gain ratio.The result reveals that there is no severe multicollinearity among the landslide conditioning factors,and the triggering factor rainfall appeared as the leading cause of the landslide.Based on the final prediction values of each model,LSM was constructed and successfully portioned into five distinct classes,like very low,low,moderate,high,and very high susceptibility.The susceptibility class-wise distribution of landslides shows that more than 90%of the landslide area falls under higher landslide susceptibility grades.The precision of models was examined using the area under the curve(AUC)of the receiver operating characteristics(ROC)curve and statistical methods like root mean square error(RMSE)and mean absolute error(MAE).In both datasets(training and validation),the CNN model achieved the maximum AUC value of 0.903 and 0.939,respectively.The lowest value of RMSE and MAE also reveals the better performance of the CNN model.So,it can be concluded that all the models have performed well,but the CNN model has outperformed the other models in terms of precision. 展开更多
关键词 Machine learning techniques information gain ratio(IGR) Landslide susceptibility map(LSM) Convolutional neural network(CNN) Receiver operating characteristics(ROC)
下载PDF
Cued search algorithm with uncertain detection performance for phased array radars 被引量:2
11
作者 Jianbin Lu Hui Xiao +1 位作者 Zemin Xi Mingmin Zhang 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2013年第6期938-945,共8页
A cued search algorithm with uncertain detection performance is proposed for phased array radars. Firstly, a target search model based on the information gain criterion is presented with known detection performance, a... A cued search algorithm with uncertain detection performance is proposed for phased array radars. Firstly, a target search model based on the information gain criterion is presented with known detection performance, and the statistical characteristic of the detection probability is calculated by using the fluctuant model of the target radar cross section (RCS). Secondly, when the detection probability is completely unknown, its probability density function is modeled with a beta distribution, and its posterior probability distribution with the radar observation is derived based on the Bayesian theory. Finally simulation results show that the cued search algorithm with a known RCS fluctuant model can achieve the best performance, and the algorithm with the detection probability modeled as a beta distribution is better than that with a random selected detection probability because the model parameters can be updated by the radar observation to approach to the real value of the detection probability. 展开更多
关键词 phased array radar detection performance cued search information gain beta distribution
下载PDF
Attribute Weighted Naïve Bayes Classifier 被引量:1
12
作者 Lee-Kien Foo Sook-Ling Chua Neveen Ibrahim 《Computers, Materials & Continua》 SCIE EI 2022年第4期1945-1957,共13页
The naïve Bayes classifier is one of the commonly used data mining methods for classification.Despite its simplicity,naïve Bayes is effective and computationally efficient.Although the strong attribute indep... The naïve Bayes classifier is one of the commonly used data mining methods for classification.Despite its simplicity,naïve Bayes is effective and computationally efficient.Although the strong attribute independence assumption in the naïve Bayes classifier makes it a tractable method for learning,this assumption may not hold in real-world applications.Many enhancements to the basic algorithm have been proposed in order to alleviate the violation of attribute independence assumption.While these methods improve the classification performance,they do not necessarily retain the mathematical structure of the naïve Bayes model and some at the expense of computational time.One approach to reduce the naïvetéof the classifier is to incorporate attribute weights in the conditional probability.In this paper,we proposed a method to incorporate attribute weights to naïve Bayes.To evaluate the performance of our method,we used the public benchmark datasets.We compared our method with the standard naïve Bayes and baseline attribute weighting methods.Experimental results show that our method to incorporate attribute weights improves the classification performance compared to both standard naïve Bayes and baseline attribute weighting methods in terms of classification accuracy and F1,especially when the independence assumption is strongly violated,which was validated using the Chi-square test of independence. 展开更多
关键词 Attribute weighting naïve Bayes Kullback-Leibler information gain CLASSIFICATION
下载PDF
Few-Shot Learning for Discovering Anomalous Behaviors in Edge Networks 被引量:1
13
作者 Merna Gamal Hala M.Abbas +2 位作者 Nour Moustafa Elena Sitnikova Rowayda A.Sadek 《Computers, Materials & Continua》 SCIE EI 2021年第11期1823-1837,共15页
Intrusion Detection Systems(IDSs)have a great interest these days to discover complex attack events and protect the critical infrastructures of the Internet of Things(IoT)networks.Existing IDSs based on shallow and de... Intrusion Detection Systems(IDSs)have a great interest these days to discover complex attack events and protect the critical infrastructures of the Internet of Things(IoT)networks.Existing IDSs based on shallow and deep network architectures demand high computational resources and high volumes of data to establish an adaptive detection engine that discovers new families of attacks from the edge of IoT networks.However,attackers exploit network gateways at the edge using new attacking scenarios(i.e.,zero-day attacks),such as ransomware and Distributed Denial of Service(DDoS)attacks.This paper proposes new IDS based on Few-Shot Deep Learning,named CNN-IDS,which can automatically identify zero-day attacks from the edge of a network and protect its IoT systems.The proposed system comprises two-methodological stages:1)a filtered Information Gain method is to select the most useful features from network data,and 2)one-dimensional Convolutional Neural Network(CNN)algorithm is to recognize new attack types from a network’s edge.The proposed model is trained and validated using two datasets of the UNSW-NB15 and Bot-IoT.The experimental results showed that it enhances about a 3%detection rate and around a 3%–4%falsepositive rate with the UNSW-NB15 dataset and about an 8%detection rate using the BoT-IoT dataset. 展开更多
关键词 Convolution neural network information gain few-shot learning IoT edge computing
下载PDF
Improved Dragonfly Optimizer for Intrusion Detection Using Deep Clustering CNN-PSO Classifier
14
作者 K.S.Bhuvaneshwari K.Venkatachalam +2 位作者 S.Hubalovsky P.Trojovsky P.Prabu 《Computers, Materials & Continua》 SCIE EI 2022年第3期5949-5965,共17页
With the rapid growth of internet based services and the data generated on these services are attracted by the attackers to intrude the networking services and information.Based on the characteristics of these intrude... With the rapid growth of internet based services and the data generated on these services are attracted by the attackers to intrude the networking services and information.Based on the characteristics of these intruders,many researchers attempted to aim to detect the intrusion with the help of automating process.Since,the large volume of data is generated and transferred through network,the security and performance are remained an issue.IDS(Intrusion Detection System)was developed to detect and prevent the intruders and secure the network systems.The performance and loss are still an issue because of the features space grows while detecting the intruders.In this paper,deep clustering based CNN have been used to detect the intruders with the help of Meta heuristic algorithms for feature selection and preprocessing.The proposed system includes three phases such as preprocessing,feature selection and classification.In the first phase,KDD dataset is preprocessed by using Binning normalization and Eigen-PCA based discretization method.In second phase,feature selection is performed by using Information Gain based Dragonfly Optimizer(IGDFO).Finally,Deep clustering based Convolutional Neural Network(CCNN)classifier optimized with Particle Swarm Optimization(PSO)identifies intrusion attacks efficiently.The clustering loss and network loss can be reduced with the optimization algorithm.We evaluate the proposed IDS model with the NSL-KDD dataset in terms of evaluation metrics.The experimental results show that proposed system achieves better performance compared with the existing system in terms of accuracy,precision,recall,f-measure and false detection rate. 展开更多
关键词 Intrusion detection system binning normalization deep clustering convolutional neural network information gain dragonfly optimizer
下载PDF
Reversion of weak-measured quantum entanglement state
15
作者 杜少将 彭勇刚 +3 位作者 冯海冉 韩峰 杨连武 郑雨军 《Chinese Physics B》 SCIE EI CAS CSCD 2020年第7期308-312,共5页
We theoretically study the reversible process of quantum entanglement state by means of weak measurement and corresponding reversible operation.We present a protocol of the reversion operation in two bodies based on t... We theoretically study the reversible process of quantum entanglement state by means of weak measurement and corresponding reversible operation.We present a protocol of the reversion operation in two bodies based on the theory of reversion of single photon and then expend it in quantum communication channels.The theoretical results demonstrate that the protocol does not break the information transmission after a weak measurement and a reversible measurement with the subsequent process in the transmission path.It can reverse the perturbed entanglement intensity evolution to its original state.Under the condition of different weak measurement intensity the protocol can reverse the perturbed quantum entanglement system perfectly.In the process we can get the classical information described by information gain from the quantum system through weak measurement operation.On the other hand,in order to realize complete reversibility,the classical information of the quantum entanglement system must obey a limited range we present in this paper in the reverse process. 展开更多
关键词 quantum entanglement weak measurement reversion operation information gain and reversibility
下载PDF
Task-Specific Feature Selection and Detection Algorithms for IoT-Based Networks
16
作者 Yang Gyun Kim Benito Mendoza +1 位作者 Ohbong Kwon John Yoon 《Journal of Computer and Communications》 2022年第10期59-73,共15页
As IoT devices become more ubiquitous, the security of IoT-based networks becomes paramount. Machine Learning-based cybersecurity enables autonomous threat detection and prevention. However, one of the challenges of a... As IoT devices become more ubiquitous, the security of IoT-based networks becomes paramount. Machine Learning-based cybersecurity enables autonomous threat detection and prevention. However, one of the challenges of applying Machine Learning-based cybersecurity in IoT devices is feature selection as most IoT devices are resource-constrained. This paper studies two feature selection algorithms: Information Gain and PSO-based, to select a minimum number of attack features, and Decision Tree and SVM are utilized for performance comparison. The consistent use of the same metrics in feature selection and detection algorithms substantially enhances the classification accuracy compared to the non-consistent use in feature selection by Information Gain (entropy) and Tree detection algorithm by classification. Furthermore, the Tree with consistent feature selection is comparable to the ensemble that provides excellent performance at the cost of computation complexity. 展开更多
关键词 CYBERSECURITY Features Selection information gain Particle Swarm Optimization Intrusion Detection System Machine Learning Decision Tree Network Attacks IoT Network
下载PDF
Lymph Diseases Prediction Using Random Forest and Particle Swarm Optimization
17
作者 Waheeda Almayyan 《Journal of Intelligent Learning Systems and Applications》 2016年第3期51-62,共12页
This research aims to develop a model to enhance lymphatic diseases diagnosis by the use of random forest ensemble machine-learning method trained with a simple sampling scheme. This study has been carried out in two ... This research aims to develop a model to enhance lymphatic diseases diagnosis by the use of random forest ensemble machine-learning method trained with a simple sampling scheme. This study has been carried out in two major phases: feature selection and classification. In the first stage, a number of discriminative features out of 18 were selected using PSO and several feature selection techniques to reduce the features dimension. In the second stage, we applied the random forest ensemble classification scheme to diagnose lymphatic diseases. While making experiments with the selected features, we used original and resampled distributions of the dataset to train random forest classifier. Experimental results demonstrate that the proposed method achieves a remark-able improvement in classification accuracy rate. 展开更多
关键词 Classification Random Forest Ensemble PSO Simple Random Sampling information gain Ratio Symmetrical Uncertainty
下载PDF
Key Symptoms Selection for Two Major SyndromesDiagnosis of Chinese Medicine in Chronic Hepatitis B 被引量:5
18
作者 ZHAO Yu KANG Hong +3 位作者 PENG Jing-hua XU Lin CAO Zhi-wei HU Yi-yang 《Chinese Journal of Integrative Medicine》 SCIE CAS CSCD 2017年第4期253-260,共8页
To identify key symptoms of two major syndromes in chronic hepatitis B (CHB), which can be the clinical evidence for Chinese medicine (CM) doctors to make decisions. Standardization scales on diagnosis for CHB in CM w... To identify key symptoms of two major syndromes in chronic hepatitis B (CHB), which can be the clinical evidence for Chinese medicine (CM) doctors to make decisions. Standardization scales on diagnosis for CHB in CM were designed including physical symptoms, tongue and pulse appearance. The total of 695 CHB cases with dampness-heat (DH) syndrome or Pi (Spleen) deficiency (SD) syndrome were collected for feature selection and modeling, another 275 CHB patients were collected in different locations for validation. Key symptoms were selected based on modified information gain (IG), and 5 classifiers were applied to assist with models training and validation. Classification accuracy and area under receiver operating characteristic curves (AUC) were evaluated. (1) Thirteen DH syndrome key symptoms and 13 SD syndrome key symptoms were selected from original 125 symptoms; (2) The key symptoms could achieve similar or better diagnostic accuracy than the original total symptoms; (3) In the validation phase, the key symptoms could identify syndromes effectively, especially in DH syndrome, which average prediction accuracy on 5 classifiers could achieve 0.864 with the average AUC 0.772. The selected key symptoms could be simple DH and SD syndromes diagnostic elements applied in clinical directly. (Registration N0.: ChiCTR-DCC-10000759). 展开更多
关键词 Chinese medicine SYNDROME chronic hepatitis B information gain
原文传递
A Novel Feature Selection Framework for Automatic Web Page Classification 被引量:3
19
作者 J.Alamelu Mangai V.Santhosh Kumar S.Appavu alias Balamurugan 《International Journal of Automation and computing》 EI 2012年第4期442-448,共7页
The number of Internet users and the number of web pages being added to www increase dramatically every day. It is therefore required to automatically and efficiently classify web pages into web directories. This help... The number of Internet users and the number of web pages being added to www increase dramatically every day. It is therefore required to automatically and efficiently classify web pages into web directories. This helps the search engines to provide users with relevant and quick retrieval results. As web pages are represented by thousands of features, feature selection helps the web page classifiers to resolve this large scale dimensionality problem. This paper proposes a new feature selection method using Ward’s minimum variance measure. This measure is first used to identify clusters of redundant features in a web page. In each cluster, the best representative features are retained and the others are eliminated. Removing such redundant features helps in minimizing the resource utilization during classification. The proposed method of feature selection is compared with other common feature selection methods. Experiments done on a benchmark data set, namely WebKB show that the proposed method performs better than most of the other feature selection methods in terms of reducing the number of features and the classifier modeling time. 展开更多
关键词 Feature selection web page classification Ward’s minimum variance information gain WebKB
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部