期刊文献+
共找到1,925篇文章
< 1 2 97 >
每页显示 20 50 100
Identifying Metabolite and Protein Biomarkers in Unstable Angina In-patients by Feature Selection Based Data Mining Method 被引量:8
1
作者 SHI Cheng-he ZHAO Hui-hui +8 位作者 HOU Na CHEN Jian-xin SHI Qi XU Xue-gong WANG Juan ZHENG Cheng-long ZHAO Ling-yan YANG Yi WANG Wei 《Chemical Research in Chinese Universities》 SCIE CAS CSCD 2011年第1期87-93,共7页
Unstable angina(UA) is the most dangerous type of Coronary Heart Disease(CHD) to cause more and more mortal and morbid world wide. Identification of biomarkers for UA at the level of proteomics and metabolomics is... Unstable angina(UA) is the most dangerous type of Coronary Heart Disease(CHD) to cause more and more mortal and morbid world wide. Identification of biomarkers for UA at the level of proteomics and metabolomics is a better avenue to understand the inner mechanism of it. Feature selection based data mining method is better suited to identify biomarkers of UA. In this study, we carried out clinical epidemiology to collect plasmas of UA in-patients and controls. Proteomics and metabolomics data were obtained via two-dimensional difference gel electrophoresis and gas chromatography techniques. We presented a novel computational strategy to select biomarkers as few as possible for UA in the two groups of data. Firstly, decision tree was used to select biomarkers for UA and 3-fold cross validation was used to evaluate computational performanees for the three methods. Alternatively, we combined inde- pendent t test and classification based data mining method as well as backward elimination technique to select, as few as possible, protein and metabolite biomarkers with best classification performances. By the method, we selected 6 proteins and 5 metabolites for UA. The novel method presented here provides a better insight into the pathology of a disease. 展开更多
关键词 BIOMARKER Metabolomics PROTEOME feature selection Data mining Unstable angina
下载PDF
Feature Selection with Optimal Stacked Sparse Autoencoder for Data Mining 被引量:4
2
作者 Manar Ahmed Hamza Siwar Ben Haj Hassine +5 位作者 Ibrahim Abunadi Fahd N.Al-Wesabi Hadeel Alsolai Anwer Mustafa Hilal Ishfaq Yaseen Abdelwahed Motwakel 《Computers, Materials & Continua》 SCIE EI 2022年第8期2581-2596,共16页
Data mining in the educational field can be used to optimize the teaching and learning performance among the students.The recently developed machine learning(ML)and deep learning(DL)approaches can be utilized to mine ... Data mining in the educational field can be used to optimize the teaching and learning performance among the students.The recently developed machine learning(ML)and deep learning(DL)approaches can be utilized to mine the data effectively.This study proposes an Improved Sailfish Optimizer-based Feature SelectionwithOptimal Stacked Sparse Autoencoder(ISOFS-OSSAE)for data mining and pattern recognition in the educational sector.The proposed ISOFS-OSSAE model aims to mine the educational data and derive decisions based on the feature selection and classification process.Moreover,the ISOFS-OSSAEmodel involves the design of the ISOFS technique to choose an optimal subset of features.Moreover,the swallow swarm optimization(SSO)with the SSAE model is derived to perform the classification process.To showcase the enhanced outcomes of the ISOFSOSSAE model,a wide range of experiments were taken place on a benchmark dataset from the University of California Irvine(UCI)Machine Learning Repository.The simulation results pointed out the improved classification performance of the ISOFS-OSSAE model over the recent state of art approaches interms of different performance measures. 展开更多
关键词 Data mining pattern recognition feature selection data classification SSAE model
下载PDF
Chimp Optimization Algorithm Based Feature Selection with Machine Learning for Medical Data Classification
3
作者 Firas Abedi Hayder M.A.Ghanimi +6 位作者 Abeer D.Algarni Naglaa F.Soliman Walid El-Shafai Ali Hashim Abbas Zahraa H.Kareem Hussein Muhi Hariz Ahmed Alkhayyat 《Computer Systems Science & Engineering》 SCIE EI 2023年第12期2791-2814,共24页
Datamining plays a crucial role in extractingmeaningful knowledge fromlarge-scale data repositories,such as data warehouses and databases.Association rule mining,a fundamental process in data mining,involves discoveri... Datamining plays a crucial role in extractingmeaningful knowledge fromlarge-scale data repositories,such as data warehouses and databases.Association rule mining,a fundamental process in data mining,involves discovering correlations,patterns,and causal structures within datasets.In the healthcare domain,association rules offer valuable opportunities for building knowledge bases,enabling intelligent diagnoses,and extracting invaluable information rapidly.This paper presents a novel approach called the Machine Learning based Association Rule Mining and Classification for Healthcare Data Management System(MLARMC-HDMS).The MLARMC-HDMS technique integrates classification and association rule mining(ARM)processes.Initially,the chimp optimization algorithm-based feature selection(COAFS)technique is employed within MLARMC-HDMS to select relevant attributes.Inspired by the foraging behavior of chimpanzees,the COA algorithm mimics their search strategy for food.Subsequently,the classification process utilizes stochastic gradient descent with a multilayer perceptron(SGD-MLP)model,while the Apriori algorithm determines attribute relationships.We propose a COA-based feature selection approach for medical data classification using machine learning techniques.This approach involves selecting pertinent features from medical datasets through COA and training machine learning models using the reduced feature set.We evaluate the performance of our approach on various medical datasets employing diverse machine learning classifiers.Experimental results demonstrate that our proposed approach surpasses alternative feature selection methods,achieving higher accuracy and precision rates in medical data classification tasks.The study showcases the effectiveness and efficiency of the COA-based feature selection approach in identifying relevant features,thereby enhancing the diagnosis and treatment of various diseases.To provide further validation,we conduct detailed experiments on a benchmark medical dataset,revealing the superiority of the MLARMCHDMS model over other methods,with a maximum accuracy of 99.75%.Therefore,this research contributes to the advancement of feature selection techniques in medical data classification and highlights the potential for improving healthcare outcomes through accurate and efficient data analysis.The presented MLARMC-HDMS framework and COA-based feature selection approach offer valuable insights for researchers and practitioners working in the field of healthcare data mining and machine learning. 展开更多
关键词 Association rule mining data classification healthcare data machine learning parameter tuning data mining feature selection MLARMC-HDMS COA stochastic gradient descent Apriori algorithm
下载PDF
Evolutionary Algorithm Based Feature Subset Selection for Students Academic Performance Analysis
4
作者 Ierin Babu R.MathuSoothana S.Kumar 《Intelligent Automation & Soft Computing》 SCIE 2023年第6期3621-3636,共16页
Educational Data Mining(EDM)is an emergent discipline that concen-trates on the design of self-learning and adaptive approaches.Higher education institutions have started to utilize analytical tools to improve student... Educational Data Mining(EDM)is an emergent discipline that concen-trates on the design of self-learning and adaptive approaches.Higher education institutions have started to utilize analytical tools to improve students’grades and retention.Prediction of students’performance is a difficult process owing to the massive quantity of educational data.Therefore,Artificial Intelligence(AI)techniques can be used for educational data mining in a big data environ-ment.At the same time,in EDM,the feature selection process becomes necessary in creation of feature subsets.Since the feature selection performance affects the predictive performance of any model,it is important to elaborately investigate the outcome of students’performance model related to the feature selection techni-ques.With this motivation,this paper presents a new Metaheuristic Optimiza-tion-based Feature Subset Selection with an Optimal Deep Learning model(MOFSS-ODL)for predicting students’performance.In addition,the proposed model uses an isolation forest-based outlier detection approach to eliminate the existence of outliers.Besides,the Chaotic Monarch Butterfly Optimization Algo-rithm(CBOA)is used for the selection of highly related features with low com-plexity and high performance.Then,a sailfish optimizer with stacked sparse autoencoder(SFO-SSAE)approach is utilized for the classification of educational data.The MOFSS-ODL model is tested against a benchmark student’s perfor-mance data set from the UCI repository.A wide-ranging simulation analysis por-trayed the improved predictive performance of the MOFSS-ODL technique over recent approaches in terms of different measures.Compared to other methods,experimental results prove that the proposed(MOFSS-ODL)classification model does a great job of predicting students’academic progress,with an accuracy of 96.49%. 展开更多
关键词 Students’performance analysis educational data mining feature selection deep learning metaheuristics outlier detection
下载PDF
Heterogeneous Ensemble Feature Selection Model(HEFSM)for Big Data Analytics
5
作者 M.Priyadharsini K.Karuppasamy 《Computer Systems Science & Engineering》 SCIE EI 2023年第5期2187-2205,共19页
Big Data applications face different types of complexities in classifications.Cleaning and purifying data by eliminating irrelevant or redundant data for big data applications becomes a complex operation while attempt... Big Data applications face different types of complexities in classifications.Cleaning and purifying data by eliminating irrelevant or redundant data for big data applications becomes a complex operation while attempting to maintain discriminative features in processed data.The existing scheme has many disadvantages including continuity in training,more samples and training time in feature selections and increased classification execution times.Recently ensemble methods have made a mark in classification tasks as combine multiple results into a single representation.When comparing to a single model,this technique offers for improved prediction.Ensemble based feature selections parallel multiple expert’s judgments on a single topic.The major goal of this research is to suggest HEFSM(Heterogeneous Ensemble Feature Selection Model),a hybrid approach that combines multiple algorithms.The major goal of this research is to suggest HEFSM(Heterogeneous Ensemble Feature Selection Model),a hybrid approach that combines multiple algorithms.Further,individual outputs produced by methods producing subsets of features or rankings or voting are also combined in this work.KNN(K-Nearest Neighbor)classifier is used to classify the big dataset obtained from the ensemble learning approach.The results found of the study have been good,proving the proposed model’s efficiency in classifications in terms of the performance metrics like precision,recall,F-measure and accuracy used. 展开更多
关键词 PSO(Particle Swarm Optimization) GWO(GreyWolf Optimization) EHO(Elephant Herding Optimization) data mining big data analytics feature selection HEFSM classifier
下载PDF
Development of Data Mining Models Based on Features Ranks Voting (FRV)
6
作者 Mofreh A.Hogo 《Computers, Materials & Continua》 SCIE EI 2022年第11期2947-2966,共20页
Data size plays a significant role in the design and the performance of data mining models.A good feature selection algorithm reduces the problems of big data size and noise due to data redundancy.Features selection a... Data size plays a significant role in the design and the performance of data mining models.A good feature selection algorithm reduces the problems of big data size and noise due to data redundancy.Features selection algorithms aim at selecting the best features and eliminating unnecessary ones,which in turn simplifies the structure of the data mining model as well as increases its performance.This paper introduces a robust features selection algorithm,named Features Ranking Voting Algorithm FRV.It merges the benefits of the different features selection algorithms to specify the features ranks in the dataset correctly and robustly;based on the feature ranks and voting algorithm.The FRV comprises of three different proposed techniques to select the minimum best feature set,the forward voting technique to select the best high ranks features,the backward voting technique,which drops the low ranks features(low importance feature),and the third technique merges the outputs from the forward and backward techniques to maximize the robustness of the selected features set.Different data mining models were built using obtained selected features sets from applying the proposed FVR on different datasets;to evaluate the success behavior of the proposed FRV.The high performance of these data mining models reflects the success of the proposed FRV algorithm.The FRV performance is compared with other features selection algorithms.It successes to develop data mining models for the Hungarian CAD dataset with Acc.of 96.8%,and with Acc.of 96%for the Z-Alizadeh Sani CAD dataset compared with 83.94%and 92.56%respectively in[48]. 展开更多
关键词 EVALUATOR features selection data mining FORWARD BACKWARD VOTING feature rank
下载PDF
Research on the big data feature mining technology based on the cloud computing
7
作者 WANG Yun 《International English Education Research》 2019年第3期52-54,共3页
The cloud computing platform has the functions of efficiently allocating the dynamic resources, generating the dynamic computing and storage according to the user requests, and providing the good platform for the big ... The cloud computing platform has the functions of efficiently allocating the dynamic resources, generating the dynamic computing and storage according to the user requests, and providing the good platform for the big data feature analysis and mining. The big data feature mining in the cloud computing environment is an effective method for the elficient application of the massive data in the information age. In the process of the big data mining, the method o f the big data feature mining based on the gradient sampling has the poor logicality. It only mines the big data features from a single-level perspective, which reduces the precision of the big data feature mining. 展开更多
关键词 CLOUD COMPUTING BIG data features mining technology model method
下载PDF
Smart Approaches to Efficient Text Mining for Categorizing Sexual Reproductive Health Short Messages into Key Themes
8
作者 Tobias Makai Mayumbo Nyirenda 《Open Journal of Applied Sciences》 2024年第2期511-532,共22页
To promote behavioral change among adolescents in Zambia, the National HIV/AIDS/STI/TB Council, in collaboration with UNICEF, developed the Zambia U-Report platform. This platform provides young people with improved a... To promote behavioral change among adolescents in Zambia, the National HIV/AIDS/STI/TB Council, in collaboration with UNICEF, developed the Zambia U-Report platform. This platform provides young people with improved access to information on various Sexual Reproductive Health topics through Short Messaging Service (SMS) messages. Over the years, the platform has accumulated millions of incoming and outgoing messages, which need to be categorized into key thematic areas for better tracking of sexual reproductive health knowledge gaps among young people. The current manual categorization process of these text messages is inefficient and time-consuming and this study aims to automate the process for improved analysis using text-mining techniques. Firstly, the study investigates the current text message categorization process and identifies a list of categories adopted by counselors over time which are then used to build and train a categorization model. Secondly, the study presents a proof of concept tool that automates the categorization of U-report messages into key thematic areas using the developed categorization model. Finally, it compares the performance and effectiveness of the developed proof of concept tool against the manual system. The study used a dataset comprising 206,625 text messages. The current process would take roughly 2.82 years to categorise this dataset whereas the trained SVM model would require only 6.4 minutes while achieving an accuracy of 70.4% demonstrating that the automated method is significantly faster, more scalable, and consistent when compared to the current manual categorization. These advantages make the SVM model a more efficient and effective tool for categorizing large unstructured text datasets. These results and the proof-of-concept tool developed demonstrate the potential for enhancing the efficiency and accuracy of message categorization on the Zambia U-report platform and other similar text messages-based platforms. 展开更多
关键词 Knowledge Discovery in Text (KDT) Sexual Reproductive Health (SRH) Text Categorization Text Classification Text Extraction Text mining feature Extraction Automated Classification Process Performance Stemming and Lemmatization Natural Language Processing (NLP)
下载PDF
Research on evolution of mining pressure field and fracture field and gas emission features
9
作者 Li Huamin Xiong Zuqiang +2 位作者 Li Dongyin Yuan Ruifu Wang Wen 《Engineering Sciences》 2012年第2期49-55,共7页
The relation between mining pressure field-fracture field and gas emission of working face is analyzed, and the concept that there is a stress point (or strain point) among permeability of coal is presented. It is b... The relation between mining pressure field-fracture field and gas emission of working face is analyzed, and the concept that there is a stress point (or strain point) among permeability of coal is presented. It is believed that the mutation of coal permeability caused by the sudden loading or unloading of working face roof as periodic weighting occurs is the main reason that a lot of gas pour into the working face. Based on the above concept, the relation is established among abutment pressure during periodie weighting, permeability of coal seam and gas emission, and relation graph is drawn. Then the loading and unloading features of coal at the moment of fracture and non-fracture of main roof are revealed. And finally it is presented that the process of sudden loading or unloading as periodic weighting occurs plays an important role in rupture propagation of coal, analytical movement of gas and gas emission. 展开更多
关键词 mining pressure field fracture field gas emission features
下载PDF
Underground Coal Mine Target Tracking via Multi-Feature Joint Sparse Representation
10
作者 Yan Lu Qingxiang Huang 《Journal of Computer and Communications》 2021年第3期118-132,共15页
Single-feature methods are unable to effectively track a target in an underground coal mine video due to the high background noise, low and uneven illumination, and drastic light fluctuation in the video. In this stud... Single-feature methods are unable to effectively track a target in an underground coal mine video due to the high background noise, low and uneven illumination, and drastic light fluctuation in the video. In this study, we propose an underground coal mine personnel target tracking method using multi-feature joint sparse representation. First, with a particle filter framework, the global and local multiple features of the target template and candidate particles are extracted. Second, each of the candidate particles is sparsely represented by a dictionary template, and reconstruction is achieved after solving the sparse coefficient. Last, the particle with the lowest reconstruction error is deemed the tracking result. To validate the effectiveness of the proposed algorithm, we compare the proposed method with three commonly employed tracking algorithms. The results show that the proposed method is able to reliably track the target in various scenarios, such as occlusion and illumination change, which generates better tracking results and validates the feasibility and effectiveness of the proposed method. 展开更多
关键词 Underground Coal Mine Sparse Representation Particle Filter Multi-feature Target-Tracking
下载PDF
An Improved Particle Swarm Optimization for Feature Selection 被引量:14
11
作者 Yuanning Liu 《Journal of Bionic Engineering》 SCIE EI CSCD 2011年第2期191-200,共10页
Particle Swarm Optimization (PSO) is a popular and bionic algorithm based on the social behavior associated with bird flocking for optimization problems. To maintain the diversity of swarms, a few studies of multi-s... Particle Swarm Optimization (PSO) is a popular and bionic algorithm based on the social behavior associated with bird flocking for optimization problems. To maintain the diversity of swarms, a few studies of multi-swarm strategy have been reported. However, the competition among swarms, reservation or destruction of a swarm, has not been considered further. In this paper, we formulate four rules by introducing the mechanism for survival of the fittest, which simulates the competition among the swarms. Based on the mechanism, we design a modified Multi-Swarm PSO (MSPSO) to solve discrete problems, which consists of a number of sub-swarms and a multi-swarm scheduler that can monitor and control each sub-swarm using the rules. To further settle the feature selection problems, we propose an Improved Feature Selection (1FS) method by integrating MSPSO, Support Vector Machines (SVM) with F-score method. The IFS method aims to achieve higher generalization capa- bility through performing kernel parameter optimization and feature selection simultaneously. The performance of the proposed method is compared with that of the standard PSO based, Genetic Algorithm (GA) based and the grid search based mcthods on 10 benchmark datasets, taken from UCI machine learning and StatLog databases. The numerical results and statistical analysis show that the proposed IFS method performs significantly better than the other three methods in terms of prediction accuracy with smaller subset of features. 展开更多
关键词 particle swarm optimization feature selection data mining support vector machines
下载PDF
Effective and Efficient Feature Selection for Large-scale Data Using Bayes' Theorem 被引量:7
12
作者 Subramanian Appavu Alias Balamurugan Ramasamy Rajaram 《International Journal of Automation and computing》 EI 2009年第1期62-71,共10页
This paper proposes one method of feature selection by using Bayes' theorem. The purpose of the proposed method is to reduce the computational complexity and increase the classification accuracy of the selected featu... This paper proposes one method of feature selection by using Bayes' theorem. The purpose of the proposed method is to reduce the computational complexity and increase the classification accuracy of the selected feature subsets. The dependence between two attributes (binary) is determined based on the probabilities of their joint values that contribute to positive and negative classification decisions. If opposing sets of attribute values do not lead to opposing classification decisions (zero probability), then the two attributes are considered independent of each other, otherwise dependent, and one of them can be removed and thus the number of attributes is reduced. The process must be repeated on all combinations of attributes. The paper also evaluates the approach by comparing it with existing feature selection algorithms over 8 datasets from University of California, Irvine (UCI) machine learning databases. The proposed method shows better results in terms of number of selected features, classification accuracy, and running time than most existing algorithms. 展开更多
关键词 Data mining CLASSIFICATION feature selection dimensionality reduction Bayes' theorem.
下载PDF
Gender Prediction on Twitter Using Stream Algorithms with N-Gram Character Features 被引量:10
13
作者 Zachary Miller Brian Dickinson Wei Hu 《International Journal of Intelligence Science》 2012年第4期143-148,共6页
The rapid growth of social networks has produced an unprecedented amount of user-generated data, which provides an excellent opportunity for text mining. Authorship analysis, an important part of text mining, attempts... The rapid growth of social networks has produced an unprecedented amount of user-generated data, which provides an excellent opportunity for text mining. Authorship analysis, an important part of text mining, attempts to learn about the author of the text through subtle variations in the writing styles that occur between gender, age and social groups. Such information has a variety of applications including advertising and law enforcement. One of the most accessible sources of user-generated data is Twitter, which makes the majority of its user data freely available through its data access API. In this study we seek to identify the gender of users on Twitter using Perceptron and Nai ve Bayes with selected 1 through 5-gram features from tweet text. Stream applications of these algorithms were employed for gender prediction to handle the speed and volume of tweet traffic. Because informal text, such as tweets, cannot be easily evaluated using traditional dictionary methods, n-gram features were implemented in this study to represent streaming tweets. The large number of 1 through 5-grams requires that only a subset of them be used in gender classification, for this reason informative n-gram features were chosen using multiple selection algorithms. In the best case the Naive Bayes and Perceptron algorithms produced accuracy, balanced accuracy, and F-measure above 99%. 展开更多
关键词 TWITTER GENDER Identification STREAM mining N-GRAM feature Selection TEXT mining
下载PDF
Spatio-temporal dynamics of vegetation in Jungar Banner of China during 2000–2017 被引量:5
14
作者 LI Xinhui LEI Shaogang +2 位作者 CHENG Wei LIU Feng WANG Weizhong 《Journal of Arid Land》 SCIE CSCD 2019年第6期837-854,共18页
It is known that the exploitation of opencast coal mines has seriously damaged the environments in the semi-arid areas.Vegetation status can reliably reflect the ecological degeneration and restoration in the opencast... It is known that the exploitation of opencast coal mines has seriously damaged the environments in the semi-arid areas.Vegetation status can reliably reflect the ecological degeneration and restoration in the opencast mining areas in the semi-arid areas.Long-time series MODIS NDVI data are widely used to simulate the vegetation cover to reflect the disturbance and restoration of local ecosystems.In this study, both qualitative(linear regression method and coefficient of variation(CoV)) and quantitative(spatial buffer analysis, and change amplitude and the rate of change in the average NDVI) analyses were conducted to analyze the spatio-temporal dynamics of vegetation during 2000–2017 in Jungar Banner of Inner Mongolia Autonomous Region, China, at the large(Jungar Banner and three mine groups) and small(three types of functional areas: opencast coal mining excavation areas, reclamation areas and natural areas) scales.The results show that the rates of change in the average NDVI in the reclamation areas(20%–60%) and opencast coal mining excavation areas(10%–20%) were considerably higher than that in the natural areas(<7%).The vegetation in the reclamation areas experienced a trend of increase(3–5 a after reclamation)-decrease(the sixth year of reclamation)-stability.The vegetation in Jungar Banner has a spatial heterogeneity under the influences of mining and reclamation activities.The ratio of vegetation improvement area to vegetation degradation area in the west, southwest and east mine groups during 2000–2017 was 8:1, 20:1 and 33:1, respectively.The regions with the high CoV of NDVI above 0.45 were mainly distributed around the opencast coal mining excavation areas, and the regions with the CoV of NDVI above 0.25 were mostly located in areas with low(28.8%) and medium-low(10.2%) vegetation cover.The average disturbance distances of mining activities on vegetation in the three mine groups(west, southwest and east) were 800, 800 and 1000 m, respectively.The greater the scale of mining, the farther the disturbance distances of mining activities on vegetation.We conclude that vegetation reclamation will certainly compensate for the negative impacts of opencast coal mining activities on vegetation.Sufficient attention should be paid to the proportional allocation of plant species(herbs and shrubs) in the reclamation areas, and the restored vegetation in these areas needs to be protected for more than 6 a.Then, as the repair time increased, the vegetation condition of the reclamation areas would exceed that of the natural areas. 展开更多
关键词 NDVI spatio-temporal dynamics linear regression method mining activities opencast coal mining areas reclamation areas Jungar Banner
下载PDF
Feature selection based on mutual information and redundancy-synergy coefficient 被引量:7
15
作者 杨胜 顾钧 《Journal of Zhejiang University Science》 EI CSCD 2004年第11期1382-1391,共10页
Mutual information is an important information measure for feature subset. In this paper, a hashing mechanism is proposed to calculate the mutual information on the feature subset. Redundancy-synergy coefficient, a no... Mutual information is an important information measure for feature subset. In this paper, a hashing mechanism is proposed to calculate the mutual information on the feature subset. Redundancy-synergy coefficient, a novel redundancy and synergy measure of features to express the class feature, is defined by mutual information. The information maximization rule was applied to derive the heuristic feature subset selection method based on mutual information and redundancy-synergy coefficient. Our experiment results showed the good performance of the new feature selection method. 展开更多
关键词 Mutual information feature selection Machine learning Data mining
下载PDF
Importance of Features Selection,Attributes Selection,Challenges and Future Directions for Medical Imaging Data:A Review 被引量:6
16
作者 Nazish Naheed Muhammad Shaheen +2 位作者 Sajid Ali Khan Mohammed Alawairdhi Muhammad Attique Khan 《Computer Modeling in Engineering & Sciences》 SCIE EI 2020年第10期315-344,共30页
In the area of pattern recognition and machine learning,features play a key role in prediction.The famous applications of features are medical imaging,image classification,and name a few more.With the exponential grow... In the area of pattern recognition and machine learning,features play a key role in prediction.The famous applications of features are medical imaging,image classification,and name a few more.With the exponential growth of information investments in medical data repositories and health service provision,medical institutions are collecting large volumes of data.These data repositories contain details information essential to support medical diagnostic decisions and also improve patient care quality.On the other hand,this growth also made it difficult to comprehend and utilize data for various purposes.The results of imaging data can become biased because of extraneous features present in larger datasets.Feature selection gives a chance to decrease the number of components in such large datasets.Through selection techniques,ousting the unimportant features and selecting a subset of components that produces prevalent characterization precision.The correct decision to find a good attribute produces a precise grouping model,which enhances learning pace and forecast control.This paper presents a review of feature selection techniques and attributes selection measures for medical imaging.This review is meant to describe feature selection techniques in a medical domainwith their pros and cons and to signify its application in imaging data and data mining algorithms.The review reveals the shortcomings of the existing feature and attributes selection techniques to multi-sourced data.Moreover,this review provides the importance of feature selection for correct classification of medical infections.In the end,critical analysis and future directions are provided. 展开更多
关键词 Medical imaging imaging data feature selection data mining attribute selection medical challenges future directions
下载PDF
Improving Knowledge Based Spam Detection Methods: The Effect of Malicious Related Features in Imbalance Data Distribution 被引量:5
17
作者 Jafar Alqatawna Hossam Faris +2 位作者 Khalid Jaradat Malek Al-Zewairi Omar Adwan 《International Journal of Communications, Network and System Sciences》 2015年第5期118-129,共12页
Spam is no longer just commercial unsolicited email messages that waste our time, it consumes network traffic and mail servers’ storage. Furthermore, spam has become a major component of several attack vectors includ... Spam is no longer just commercial unsolicited email messages that waste our time, it consumes network traffic and mail servers’ storage. Furthermore, spam has become a major component of several attack vectors including attacks such as phishing, cross-site scripting, cross-site request forgery and malware infection. Statistics show that the amount of spam containing malicious contents increased compared to the one advertising legitimate products and services. In this paper, the issue of spam detection is investigated with the aim to develop an efficient method to identify spam email based on the analysis of the content of email messages. We identify a set of features that have a considerable number of malicious related features. Our goal is to study the effect of these features in helping the classical classifiers in identifying spam emails. To make the problem more challenging, we developed spam classification models based on imbalanced data where spam emails form the rare class with only 16.5% of the total emails. Different metrics were utilized in the evaluation of the developed models. Results show noticeable improvement of spam classification models when trained by dataset that includes malicious related features. 展开更多
关键词 SPAM E-MAIL MALICIOUS SPAM SPAM Detection SPAM featureS Security Mechanism Data mining
下载PDF
Unsupervised feature selection based on Markov blanket and particle swarm optimization 被引量:1
18
作者 Yintong Wang Jiandong Wang +1 位作者 Hao Liao Haiyan Chen 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2017年第1期151-161,共11页
Feature selection plays an important role in data mining and recognition, especially in the large scale text, image and biological data. Specifically, the class label information is unavailable to guide the selection ... Feature selection plays an important role in data mining and recognition, especially in the large scale text, image and biological data. Specifically, the class label information is unavailable to guide the selection of minimal feature subset in unsupervised feature selection, which is challenging and interesting. An unsupervised feature selection based on Markov blanket and particle swarm optimization is proposed named as UFSMB-PSO. The proposed method seeks to find the high-quality feature subset through multi-particles' cooperation of particle swarm optimization without using any learning algorithms. Moreover, the features' relevance will be computed based on an information metric of relevance gain, which provides an information theoretical foundation for finding the minimization of the redundancy between features. Our results on several benchmark datasets demonstrate that UFSMB-PSO can achieve significant improvement over state of the art unsupervised methods. © 1990-2011 Beijing Institute of Aerospace Information. 展开更多
关键词 Character recognition Data mining feature extraction Information theory
下载PDF
A Hybrid Feature Selection Framework for Predicting Students Performance 被引量:1
19
作者 Maryam Zaffar Manzoor Ahmed Hashmani +4 位作者 Raja Habib KS Quraishi Muhammad Irfan Samar Alqhtani Mohammed Hamdi 《Computers, Materials & Continua》 SCIE EI 2022年第1期1893-1920,共28页
Student performance prediction helps the educational stakeholders to take proactive decisions and make interventions,for the improvement of quality of education and to meet the dynamic needs of society.The selection o... Student performance prediction helps the educational stakeholders to take proactive decisions and make interventions,for the improvement of quality of education and to meet the dynamic needs of society.The selection of features for student’s performance prediction not only plays significant role in increasing prediction accuracy,but also helps in building the strategic plans for the improvement of students’academic performance.There are different feature selection algorithms for predicting the performance of students,however the studies reported in the literature claim that there are different pros and cons of existing feature selection algorithms in selection of optimal features.In this paper,a hybrid feature selection framework(using feature-fusion)is designed to identify the significant features and associated features with target class,to predict the performance of students.The main goal of the proposed hybrid feature selection is not only to improve the prediction accuracy,but also to identify optimal features for building productive strategies for the improvement in students’academic performance.The key difference between proposed hybrid feature selection framework and existing hybrid feature selection framework,is two level feature fusion technique,with the utilization of cosine-based fusion.Whereas,according to the results reported in existing literature,cosine similarity is considered as the best similarity measure among existing similarity measures.The proposed hybrid feature selection is validated on four benchmark datasets with variations in number of features and number of instances.The validated results confirm that the proposed hybrid feature selection framework performs better than the existing hybrid feature selection framework,existing feature selection algorithms in terms of accuracy,f-measure,recall,and precision.Results reported in presented paper show that the proposed approach gives more than 90%accuracy on benchmark dataset that is better than the results of existing approach. 展开更多
关键词 Educational data mining feature selection hybrid feature selection
下载PDF
Feature Selection for Classificatory Analysis Based on Information-theoretic Criteria 被引量:3
20
作者 HUANG Jin-Jie LV Ning +1 位作者 LI Shuang-Quan CAI Yun-Ze 《自动化学报》 EI CSCD 北大核心 2008年第3期383-392,共10页
由选择为类别的分析减少模式的维数的特征选择目的而不是无关或冗余的特征最增进知识。在这研究,为特征评价的二项新奇信息理论上的措施被介绍:一个人是一个改进公式估计在候选人特征 fi 和给选择特征 S 的子集的目标班 C 之间的有条... 由选择为类别的分析减少模式的维数的特征选择目的而不是无关或冗余的特征最增进知识。在这研究,为特征评价的二项新奇信息理论上的措施被介绍:一个人是一个改进公式估计在候选人特征 fi 和给选择特征 S 的子集的目标班 C 之间的有条件的相互的信息,即,我(C;fi|S ) ,在假设下面,特征的那个信息一致地被散布;其它是基于的一个相互的信息(MI ) 能捕获无关、冗余的输入的建设性的标准在特征的信息的任意的分布下面展示。与这二项措施,二个新特征选择算法,叫了二次的 基于MI 的特征选择( QMIFS )途径和 基于MI 的建设性的标准( MICC )途径分别地,被建议,在哪个在 Battiti 的 MIFS 相似的没有参数并且( Kwak 和 Choi )的 MIFS-U 方法需要是预设。因此,怎么选择适当价值为的难处理的问题完全被避免与已经选择的特征做在关联之间的折衷到目标班和冗余性。试验性的结果表明 QMIFS 和 MICC 的好表演在上合成并且基准数据集合。 展开更多
关键词 特征选择 信息理论标准 模式分类 数据挖掘
下载PDF
上一页 1 2 97 下一页 到第
使用帮助 返回顶部