期刊文献+
共找到2篇文章
< 1 >
每页显示 20 50 100
An Imbalanced Dataset and Class Overlapping Classification Model for Big Data 被引量:1
1
作者 mini prince P.M.Joe Prathap 《Computer Systems Science & Engineering》 SCIE EI 2023年第2期1009-1024,共16页
Most modern technologies,such as social media,smart cities,and the internet of things(IoT),rely on big data.When big data is used in the real-world applications,two data challenges such as class overlap and class imba... Most modern technologies,such as social media,smart cities,and the internet of things(IoT),rely on big data.When big data is used in the real-world applications,two data challenges such as class overlap and class imbalance arises.When dealing with large datasets,most traditional classifiers are stuck in the local optimum problem.As a result,it’s necessary to look into new methods for dealing with large data collections.Several solutions have been proposed for overcoming this issue.The rapid growth of the available data threatens to limit the usefulness of many traditional methods.Methods such as oversampling and undersampling have shown great promises in addressing the issues of class imbalance.Among all of these techniques,Synthetic Minority Oversampling TechniquE(SMOTE)has produced the best results by generating synthetic samples for the minority class in creating a balanced dataset.The issue is that their practical applicability is restricted to problems involving tens of thousands or lower instances of each.In this paper,we have proposed a parallel mode method using SMOTE and MapReduce strategy,this distributes the operation of the algorithm among a group of computational nodes for addressing the aforementioned problem.Our proposed solution has been divided into three stages.Thefirst stage involves the process of splitting the data into different blocks using a mapping function,followed by a pre-processing step for each mapping block that employs a hybrid SMOTE algo-rithm for solving the class imbalanced problem.On each map block,a decision tree model would be constructed.Finally,the decision tree blocks would be com-bined for creating a classification model.We have used numerous datasets with up to 4 million instances in our experiments for testing the proposed scheme’s cap-abilities.As a result,the Hybrid SMOTE appears to have good scalability within the framework proposed,and it also cuts down the processing time. 展开更多
关键词 Imbalanced dataset class overlapping SMOTE MAPREDUCE parallel programming OVERSAMPLING
下载PDF
A Novel Approach to Design Distribution Preserving Framework for Big Data
2
作者 mini prince P.M.Joe Prathap 《Intelligent Automation & Soft Computing》 SCIE 2023年第3期2789-2803,共15页
In several fields like financial dealing,industry,business,medicine,et cetera,Big Data(BD)has been utilized extensively,which is nothing but a collection of a huge amount of data.However,it is highly complicated alon... In several fields like financial dealing,industry,business,medicine,et cetera,Big Data(BD)has been utilized extensively,which is nothing but a collection of a huge amount of data.However,it is highly complicated along with time-consuming to process a massive amount of data.Thus,to design the Distribution Preserving Framework for BD,a novel methodology has been proposed utilizing Manhattan Distance(MD)-centered Partition Around Medoid(MD–PAM)along with Conjugate Gradient Artificial Neural Network(CG-ANN),which undergoes various steps to reduce the complications of BD.Firstly,the data are processed in the pre-processing phase by mitigating the data repetition utilizing the map-reduce function;subsequently,the missing data are handled by substituting or by ignoring the missed values.After that,the data are transmuted into a normalized form.Next,to enhance the classification performance,the data’s dimensionalities are minimized by employing Gaussian Kernel(GK)-Fisher Discriminant Analysis(GK-FDA).Afterwards,the processed data is submitted to the partitioning phase after transmuting it into a structured format.In the partition phase,by utilizing the MD-PAM,the data are partitioned along with grouped into a cluster.Lastly,by employing CG-ANN,the data are classified in the classification phase so that the needed data can be effortlessly retrieved by the user.To analogize the outcomes of the CG-ANN with the prevailing methodologies,the NSL-KDD openly accessible datasets are utilized.The experiential outcomes displayed that an efficient result along with a reduced computation cost was shown by the proposed CG-ANN.The proposed work outperforms well in terms of accuracy,sensitivity and specificity than the existing systems. 展开更多
关键词 Big data artificial neural network fisher discriminant analysis distribution preserving framework manhattan distance
下载PDF
上一页 1 下一页 到第
使用帮助 返回顶部