期刊文献+
共找到4篇文章
< 1 >
每页显示 20 50 100
Profiling Astronomical Objects Using Unsupervised Learning Approach
1
作者 Theerapat Sangpetch tossapon boongoen Natthakan Iam-On 《Computers, Materials & Continua》 SCIE EI 2023年第1期1641-1655,共15页
Attempts to determine characters of astronomical objects have been one of major and vibrant activities in both astronomy and data science fields.Instead of a manual inspection,various automated systems are invented to... Attempts to determine characters of astronomical objects have been one of major and vibrant activities in both astronomy and data science fields.Instead of a manual inspection,various automated systems are invented to satisfy the need,including the classification of light curve profiles.A specific Kaggle competition,namely Photometric LSST Astronomical Time-Series Classification Challenge(PLAsTiCC),is launched to gather new ideas of tackling the abovementioned task using the data set collected from the Large Synoptic Survey Telescope(LSST)project.Almost all proposed methods fall into the supervised family with a common aim to categorize each object into one of pre-defined types.As this challenge focuses on developing a predictive model that is robust to classifying unseen data,those previous attempts similarly encounter the lack of discriminate features,since distribution of training and actual test datasets are largely different.As a result,well-known classification algorithms prove to be sub-optimal,while more complicated feature extraction techniques may help to slightly boost the predictive performance.Given such a burden,this research is set to explore an unsupervised alternative to the difficult quest,where common classifiers fail to reach the 50%accuracy mark.A clustering technique is exploited to transform the space of training data,from which a more accurate classifier can be built.In addition to a single clustering framework that provides a comparable accuracy to the front runners of supervised learning,a multiple-clustering alternative is also introduced with improved performance.In fact,it is able to yield a higher accuracy rate of 58.32%from 51.36%that is obtained using a simple clustering.For this difficult problem,it is rather good considering for those achieved by well-known models like support vector machine(SVM)with 51.80%and Naive Bayes(NB)with only 2.92%. 展开更多
关键词 ASTRONOMY sky survey light curve data CLASSIFICATION data clustering
下载PDF
Classification of Adversarial Attacks Using Ensemble Clustering Approach
2
作者 Pongsakorn Tatongjai tossapon boongoen +2 位作者 Natthakan Iam-On Nitin Naik Longzhi Yang 《Computers, Materials & Continua》 SCIE EI 2023年第2期2479-2498,共20页
As more business transactions and information services have been implemented via communication networks,both personal and organization assets encounter a higher risk of attacks.To safeguard these,a perimeter defence l... As more business transactions and information services have been implemented via communication networks,both personal and organization assets encounter a higher risk of attacks.To safeguard these,a perimeter defence likeNIDS(network-based intrusion detection system)can be effective for known intrusions.There has been a great deal of attention within the joint community of security and data science to improve machine-learning based NIDS such that it becomes more accurate for adversarial attacks,where obfuscation techniques are applied to disguise patterns of intrusive traffics.The current research focuses on non-payload connections at the TCP(transmission control protocol)stack level that is applicable to different network applications.In contrary to the wrapper method introduced with the benchmark dataset,three new filter models are proposed to transform the feature space without knowledge of class labels.These ECT(ensemble clustering based transformation)techniques,i.e.,ECT-Subspace,ECT-Noise and ECT-Combined,are developed using the concept of ensemble clustering and three different ensemble generation strategies,i.e.,random feature subspace,feature noise injection and their combinations.Based on the empirical study with published dataset and four classification algorithms,new models usually outperform that original wrapper and other filter alternatives found in the literature.This is similarly summarized from the first experiment with basic classification of legitimate and direct attacks,and the second that focuses on recognizing obfuscated intrusions.In addition,analysis of algorithmic parameters,i.e.,ensemble size and level of noise,is provided as a guideline for a practical use. 展开更多
关键词 Intrusion detection adversarial attack machine learning feature transformation ensemble clustering
下载PDF
Improved KNN Imputation for Missing Values in Gene Expression Data 被引量:3
3
作者 Phimmarin Keerin tossapon boongoen 《Computers, Materials & Continua》 SCIE EI 2022年第2期4009-4025,共17页
The problem of missing values has long been studied by researchers working in areas of data science and bioinformatics,especially the analysis of gene expression data that facilitates an early detection of cancer.Many... The problem of missing values has long been studied by researchers working in areas of data science and bioinformatics,especially the analysis of gene expression data that facilitates an early detection of cancer.Many attempts show improvements made by excluding samples with missing information from the analysis process,while others have tried to fill the gaps with possible values.While the former is simple,the latter safeguards information loss.For that,a neighbour-based(KNN)approach has proven more effective than other global estimators.The paper extends this further by introducing a new summarizationmethod to theKNNmodel.It is the first study that applies the concept of ordered weighted averaging(OWA)operator to such a problem context.In particular,two variations of OWA aggregation are proposed and evaluated against their baseline and other neighbor-based models.Using different ratios of missing values from 1%-20%and a set of six published gene expression datasets,the experimental results suggest that newmethods usually provide more accurate estimates than those compared methods.Specific to the missing rates of 5%and 20%,the best NRMSE scores as averages across datasets is 0.65 and 0.69,while the highest measures obtained by existing techniques included in this study are 0.80 and 0.84,respectively. 展开更多
关键词 Gene expression missing value IMPUTATION KNN OWA operator
下载PDF
Using Link-Based Consensus Clustering for Mixed-Type Data Analysis
4
作者 tossapon boongoen Natthakan Iam-On 《Computers, Materials & Continua》 SCIE EI 2022年第1期1993-2011,共19页
A mix between numerical and nominal data types commonly presents many modern-age data collections.Examples of these include banking data,sales history and healthcare records,where both continuous attributes like age a... A mix between numerical and nominal data types commonly presents many modern-age data collections.Examples of these include banking data,sales history and healthcare records,where both continuous attributes like age and nominal ones like blood type are exploited to characterize account details,business transactions or individuals.However,only a few standard clustering techniques and consensus clusteringmethods are provided to examine such a data thus far.Given this insight,the paper introduces novel extensions of link-based cluster ensemble,LCEWCT and LCEWTQ that are accurate for analyzing mixed-type data.They promote diversity within an ensemble through different initializations of the k-prototypes algorithm as base clusterings and then refine the summarized data using a link-based approach.Based on the evaluationmetric of NMI(NormalizedMutual Information)that is averaged across different combinations of benchmark datasets and experimental settings,these new models reach the improved level of 0.34,while the best model found in the literature obtains only around the mark of 0.24.Besides,parameter analysis included herein helps to enhance their performance even further,given relations of clustering quality and algorithmic variables specific to the underlying link-based models.Moreover,another significant factor of ensemble size is examined in such a way to justify a tradeoff between complexity and accuracy. 展开更多
关键词 Cluster analysis mixed-type data consensus clustering link analysis
下载PDF
上一页 1 下一页 到第
使用帮助 返回顶部