期刊文献+
共找到324,270篇文章
< 1 2 250 >
每页显示 20 50 100
Subspace Clustering in High-Dimensional Data Streams:A Systematic Literature Review
1
作者 Nur Laila Ab Ghani Izzatdin Abdul Aziz Said Jadid AbdulKadir 《Computers, Materials & Continua》 SCIE EI 2023年第5期4649-4668,共20页
Clustering high dimensional data is challenging as data dimensionality increases the distance between data points,resulting in sparse regions that degrade clustering performance.Subspace clustering is a common approac... Clustering high dimensional data is challenging as data dimensionality increases the distance between data points,resulting in sparse regions that degrade clustering performance.Subspace clustering is a common approach for processing high-dimensional data by finding relevant features for each cluster in the data space.Subspace clustering methods extend traditional clustering to account for the constraints imposed by data streams.Data streams are not only high-dimensional,but also unbounded and evolving.This necessitates the development of subspace clustering algorithms that can handle high dimensionality and adapt to the unique characteristics of data streams.Although many articles have contributed to the literature review on data stream clustering,there is currently no specific review on subspace clustering algorithms in high-dimensional data streams.Therefore,this article aims to systematically review the existing literature on subspace clustering of data streams in high-dimensional streaming environments.The review follows a systematic methodological approach and includes 18 articles for the final analysis.The analysis focused on two research questions related to the general clustering process and dealing with the unbounded and evolving characteristics of data streams.The main findings relate to six elements:clustering process,cluster search,subspace search,synopsis structure,cluster maintenance,and evaluation measures.Most algorithms use a two-phase clustering approach consisting of an initialization stage,a refinement stage,a cluster maintenance stage,and a final clustering stage.The density-based top-down subspace clustering approach is more widely used than the others because it is able to distinguish true clusters and outliers using projected microclusters.Most algorithms implicitly adapt to the evolving nature of the data stream by using a time fading function that is sensitive to outliers.Future work can focus on the clustering framework,parameter optimization,subspace search techniques,memory-efficient synopsis structures,explicit cluster change detection,and intrinsic performance metrics.This article can serve as a guide for researchers interested in high-dimensional subspace clustering methods for data streams. 展开更多
关键词 CLUSTERING subspace clustering projected clustering data stream stream clustering high dimensionality evolving data stream concept drift
下载PDF
Similarity measurement method of high-dimensional data based on normalized net lattice subspace 被引量:4
2
作者 李文法 Wang Gongming +1 位作者 Li Ke Huang Su 《High Technology Letters》 EI CAS 2017年第2期179-184,共6页
The performance of conventional similarity measurement methods is affected seriously by the curse of dimensionality of high-dimensional data.The reason is that data difference between sparse and noisy dimensionalities... The performance of conventional similarity measurement methods is affected seriously by the curse of dimensionality of high-dimensional data.The reason is that data difference between sparse and noisy dimensionalities occupies a large proportion of the similarity,leading to the dissimilarities between any results.A similarity measurement method of high-dimensional data based on normalized net lattice subspace is proposed.The data range of each dimension is divided into several intervals,and the components in different dimensions are mapped onto the corresponding interval.Only the component in the same or adjacent interval is used to calculate the similarity.To validate this method,three data types are used,and seven common similarity measurement methods are compared.The experimental result indicates that the relative difference of the method is increasing with the dimensionality and is approximately two or three orders of magnitude higher than the conventional method.In addition,the similarity range of this method in different dimensions is [0,1],which is fit for similarity analysis after dimensionality reduction. 展开更多
关键词 high-dimensional data the curse of dimensionality SIMILARITY NORMALIZATION SUBSPACE NPsim
下载PDF
Making Short-term High-dimensional Data Predictable
3
作者 CHEN Luonan 《Bulletin of the Chinese Academy of Sciences》 2018年第4期243-244,共2页
Making accurate forecast or prediction is a challenging task in the big data era, in particular for those datasets involving high-dimensional variables but short-term time series points,which are generally available f... Making accurate forecast or prediction is a challenging task in the big data era, in particular for those datasets involving high-dimensional variables but short-term time series points,which are generally available from real-world systems.To address this issue, Prof. 展开更多
关键词 RDE MAKING SHORT-TERM high-dimensional data Predictable
下载PDF
Frequent item sets mining from high-dimensional dataset based on a novel binary particle swarm optimization 被引量:2
4
作者 张中杰 黄健 卫莹 《Journal of Central South University》 SCIE EI CAS CSCD 2016年第7期1700-1708,共9页
A novel binary particle swarm optimization for frequent item sets mining from high-dimensional dataset(BPSO-HD) was proposed, where two improvements were joined. Firstly, the dimensionality reduction of initial partic... A novel binary particle swarm optimization for frequent item sets mining from high-dimensional dataset(BPSO-HD) was proposed, where two improvements were joined. Firstly, the dimensionality reduction of initial particles was designed to ensure the reasonable initial fitness, and then, the dynamically dimensionality cutting of dataset was built to decrease the search space. Based on four high-dimensional datasets, BPSO-HD was compared with Apriori to test its reliability, and was compared with the ordinary BPSO and quantum swarm evolutionary(QSE) to prove its advantages. The experiments show that the results given by BPSO-HD is reliable and better than the results generated by BPSO and QSE. 展开更多
关键词 粒子群算法 频繁项集 数据集 二进制 挖掘 高维 APRIORI 初始粒子
下载PDF
A nearest neighbor search algorithm of high-dimensional data based on sequential NPsim matrix
5
作者 李文法 Wang Gongming +1 位作者 Ma Nan Liu Hongzhe 《High Technology Letters》 EI CAS 2016年第3期241-247,共7页
Problems existin similarity measurement and index tree construction which affect the performance of nearest neighbor search of high-dimensional data. The equidistance problem is solved using NPsim function to calculat... Problems existin similarity measurement and index tree construction which affect the performance of nearest neighbor search of high-dimensional data. The equidistance problem is solved using NPsim function to calculate similarity. And a sequential NPsim matrix is built to improve indexing performance. To sum up the above innovations,a nearest neighbor search algorithm of high-dimensional data based on sequential NPsim matrix is proposed in comparison with the nearest neighbor search algorithms based on KD-tree or SR-tree on Munsell spectral data set. Experimental results show that the proposed algorithm similarity is better than that of other algorithms and searching speed is more than thousands times of others. In addition,the slow construction speed of sequential NPsim matrix can be increased by using parallel computing. 展开更多
关键词 搜索算法 高维数据 M矩阵 最近邻 序列 相似性解 搜索性能 搜索速度
下载PDF
Optimal Estimation of High-Dimensional Covariance Matrices with Missing and Noisy Data
6
作者 Meiyin Wang Wanzhou Ye 《Advances in Pure Mathematics》 2024年第4期214-227,共14页
The estimation of covariance matrices is very important in many fields, such as statistics. In real applications, data are frequently influenced by high dimensions and noise. However, most relevant studies are based o... The estimation of covariance matrices is very important in many fields, such as statistics. In real applications, data are frequently influenced by high dimensions and noise. However, most relevant studies are based on complete data. This paper studies the optimal estimation of high-dimensional covariance matrices based on missing and noisy sample under the norm. First, the model with sub-Gaussian additive noise is presented. The generalized sample covariance is then modified to define a hard thresholding estimator , and the minimax upper bound is derived. After that, the minimax lower bound is derived, and it is concluded that the estimator presented in this article is rate-optimal. Finally, numerical simulation analysis is performed. The result shows that for missing samples with sub-Gaussian noise, if the true covariance matrix is sparse, the hard thresholding estimator outperforms the traditional estimate method. 展开更多
关键词 high-dimensional Covariance Matrix Missing data Sub-Gaussian Noise Optimal Estimation
下载PDF
Multi-Objective Equilibrium Optimizer for Feature Selection in High-Dimensional English Speech Emotion Recognition
7
作者 Liya Yue Pei Hu +1 位作者 Shu-Chuan Chu Jeng-Shyang Pan 《Computers, Materials & Continua》 SCIE EI 2024年第2期1957-1975,共19页
Speech emotion recognition(SER)uses acoustic analysis to find features for emotion recognition and examines variations in voice that are caused by emotions.The number of features acquired with acoustic analysis is ext... Speech emotion recognition(SER)uses acoustic analysis to find features for emotion recognition and examines variations in voice that are caused by emotions.The number of features acquired with acoustic analysis is extremely high,so we introduce a hybrid filter-wrapper feature selection algorithm based on an improved equilibrium optimizer for constructing an emotion recognition system.The proposed algorithm implements multi-objective emotion recognition with the minimum number of selected features and maximum accuracy.First,we use the information gain and Fisher Score to sort the features extracted from signals.Then,we employ a multi-objective ranking method to evaluate these features and assign different importance to them.Features with high rankings have a large probability of being selected.Finally,we propose a repair strategy to address the problem of duplicate solutions in multi-objective feature selection,which can improve the diversity of solutions and avoid falling into local traps.Using random forest and K-nearest neighbor classifiers,four English speech emotion datasets are employed to test the proposed algorithm(MBEO)as well as other multi-objective emotion identification techniques.The results illustrate that it performs well in inverted generational distance,hypervolume,Pareto solutions,and execution time,and MBEO is appropriate for high-dimensional English SER. 展开更多
关键词 Speech emotion recognition filter-wrapper high-dimensional feature selection equilibrium optimizer MULTI-OBJECTIVE
下载PDF
An Efficient Reliability-Based Optimization Method Utilizing High-Dimensional Model Representation and Weight-Point Estimation Method
8
作者 Xiaoyi Wang Xinyue Chang +2 位作者 Wenxuan Wang Zijie Qiao Feng Zhang 《Computer Modeling in Engineering & Sciences》 SCIE EI 2024年第5期1775-1796,共22页
The objective of reliability-based design optimization(RBDO)is to minimize the optimization objective while satisfying the corresponding reliability requirements.However,the nested loop characteristic reduces the effi... The objective of reliability-based design optimization(RBDO)is to minimize the optimization objective while satisfying the corresponding reliability requirements.However,the nested loop characteristic reduces the efficiency of RBDO algorithm,which hinders their application to high-dimensional engineering problems.To address these issues,this paper proposes an efficient decoupled RBDO method combining high dimensional model representation(HDMR)and the weight-point estimation method(WPEM).First,we decouple the RBDO model using HDMR and WPEM.Second,Lagrange interpolation is used to approximate a univariate function.Finally,based on the results of the first two steps,the original nested loop reliability optimization model is completely transformed into a deterministic design optimization model that can be solved by a series of mature constrained optimization methods without any additional calculations.Two numerical examples of a planar 10-bar structure and an aviation hydraulic piping system with 28 design variables are analyzed to illustrate the performance and practicability of the proposed method. 展开更多
关键词 Reliability-based design optimization high-dimensional model decomposition point estimation method Lagrange interpolation aviation hydraulic piping system
下载PDF
A Length-Adaptive Non-Dominated Sorting Genetic Algorithm for Bi-Objective High-Dimensional Feature Selection
9
作者 Yanlu Gong Junhai Zhou +2 位作者 Quanwang Wu MengChu Zhou Junhao Wen 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2023年第9期1834-1844,共11页
As a crucial data preprocessing method in data mining,feature selection(FS)can be regarded as a bi-objective optimization problem that aims to maximize classification accuracy and minimize the number of selected featu... As a crucial data preprocessing method in data mining,feature selection(FS)can be regarded as a bi-objective optimization problem that aims to maximize classification accuracy and minimize the number of selected features.Evolutionary computing(EC)is promising for FS owing to its powerful search capability.However,in traditional EC-based methods,feature subsets are represented via a length-fixed individual encoding.It is ineffective for high-dimensional data,because it results in a huge search space and prohibitive training time.This work proposes a length-adaptive non-dominated sorting genetic algorithm(LA-NSGA)with a length-variable individual encoding and a length-adaptive evolution mechanism for bi-objective highdimensional FS.In LA-NSGA,an initialization method based on correlation and redundancy is devised to initialize individuals of diverse lengths,and a Pareto dominance-based length change operator is introduced to guide individuals to explore in promising search space adaptively.Moreover,a dominance-based local search method is employed for further improvement.The experimental results based on 12 high-dimensional gene datasets show that the Pareto front of feature subsets produced by LA-NSGA is superior to those of existing algorithms. 展开更多
关键词 Bi-objective optimization feature selection(FS) genetic algorithm high-dimensional data length-adaptive
下载PDF
An Ant Colony Optimization Based Dimension Reduction Method for High-Dimensional Datasets 被引量:3
10
作者 Ying Li Gang Wang +2 位作者 Huiling Chen Lian Shi Lei Qin 《Journal of Bionic Engineering》 SCIE EI CSCD 2013年第2期231-241,共11页
In this paper, a bionic optimization algorithm based dimension reduction method named Ant Colony Optimization -Selection (ACO-S) is proposed for high-dimensional datasets. Because microarray datasets comprise tens o... In this paper, a bionic optimization algorithm based dimension reduction method named Ant Colony Optimization -Selection (ACO-S) is proposed for high-dimensional datasets. Because microarray datasets comprise tens of thousands of features (genes), they are usually used to test the dimension reduction techniques. ACO-S consists of two stages in which two well-known ACO algorithms, namely ant system and ant colony system, are utilized to seek for genes, respectively. In the first stage, a modified ant system is used to filter the nonsignificant genes from high-dimensional space, and a number of promising genes are reserved in the next step. In the second stage, an improved ant colony system is applied to gene selection. In order to enhance the search ability of ACOs, we propose a method for calculating priori available heuristic information and design a fuzzy logic controller to dynamically adjust the number of ants in ant colony system. Furthermore, we devise another fuzzy logic controller to tune the parameter (q0) in ant colony system. We evaluate the performance of ACO-S on five microarray datasets, which have dimensions varying from 7129 to 12000. We also compare the performance of ACO-S with the results obtained from four existing well-known bionic optimization algorithms. The comparison results show that ACO-S has a notable ability to" generate a gene subset with the smallest size and salient features while yielding high classification accuracy. The comparative results generated by ACO-S adopting different classifiers are also given. The proposed method is shown to be a promising and effective tool for mining high-dimension data and mobile robot navigation. 展开更多
关键词 gene selection feature selection ant colony optimization high-dimensional data
原文传递
Observation points classifier ensemble for high-dimensional imbalanced classification
11
作者 Yulin He Xu Li +3 位作者 Philippe Fournier‐Viger Joshua Zhexue Huang Mianjie Li Salman Salloum 《CAAI Transactions on Intelligence Technology》 SCIE EI 2023年第2期500-517,共18页
In this paper,an Observation Points Classifier Ensemble(OPCE)algorithm is proposed to deal with High-Dimensional Imbalanced Classification(HDIC)problems based on data processed using the Multi-Dimensional Scaling(MDS)... In this paper,an Observation Points Classifier Ensemble(OPCE)algorithm is proposed to deal with High-Dimensional Imbalanced Classification(HDIC)problems based on data processed using the Multi-Dimensional Scaling(MDS)feature extraction technique.First,dimensionality of the original imbalanced data is reduced using MDS so that distances between any two different samples are preserved as well as possible.Second,a novel OPCE algorithm is applied to classify imbalanced samples by placing optimised observation points in a low-dimensional data space.Third,optimization of the observation point mappings is carried out to obtain a reliable assessment of the unknown samples.Exhaustive experiments have been conducted to evaluate the feasibility,rationality,and effectiveness of the proposed OPCE algorithm using seven benchmark HDIC data sets.Experimental results show that(1)the OPCE algorithm can be trained faster on low-dimensional imbalanced data than on high-dimensional data;(2)the OPCE algorithm can correctly identify samples as the number of optimised observation points is increased;and(3)statistical analysis reveals that OPCE yields better HDIC performances on the selected data sets in comparison with eight other HDIC algorithms.This demonstrates that OPCE is a viable algorithm to deal with HDIC problems. 展开更多
关键词 classifier ensemble feature transformation high-dimensional data classification imbalanced learning observation point mechanism
下载PDF
Visualizing large-scale high-dimensional data via hierarchical embedding of KNN graphs 被引量:2
12
作者 Haiyang Zhu Minfeng Zhu +5 位作者 Yingchaojie Feng Deng Cai Yuanzhe Hu Shilong Wu Xiangyang Wu Wei Chen 《Visual Informatics》 EI 2021年第2期51-59,共9页
Visualizing intrinsic structures of high-dimensional data is an essential task in data analysis.Over the past decades,a large number of methods have been proposed.Among all solutions,one promising way for enabling eff... Visualizing intrinsic structures of high-dimensional data is an essential task in data analysis.Over the past decades,a large number of methods have been proposed.Among all solutions,one promising way for enabling effective visual exploration is to construct a k-nearest neighbor(KNN)graph and visualize the graph in a low-dimensional space.Yet,state-of-the-art methods such as the LargeVis still suffer from two main problems when applied to large-scale data:(1)they may produce unappealing visualizations due to the non-convexity of the cost function;(2)visualizing the KNN graph is still time-consuming.In this work,we propose a novel visualization algorithm that leverages a multilevel representation to achieve a high-quality graph layout and employs a cluster-based approximation scheme to accelerate the KNN graph layout.Experiments on various large-scale datasets indicate that our approach achieves a speedup by a factor of five for KNN graph visualization compared to LargeVis and yields aesthetically pleasing visualization results. 展开更多
关键词 high-dimensional data visualization KNN graph Graph visualization
原文传递
Differentially private high-dimensional data publication via grouping and truncating techniques 被引量:3
13
作者 Ning WANG Yu GU +2 位作者 Jia XU Fangfang LI Ge YU 《Frontiers of Computer Science》 SCIE EI CSCD 2019年第2期382-395,共14页
The count of one column for high-dimensional datasets, i.e., the number of records containing this column, has been widely used in nuinerous applications such as analyzing popular spots based on check-in location info... The count of one column for high-dimensional datasets, i.e., the number of records containing this column, has been widely used in nuinerous applications such as analyzing popular spots based on check-in location information and mining valuable items from shopping records. However, this poses a privacy threat when directly publishing this information. Differential privacy (DP), as a notable paradigm for strong privacy guarantees, is thereby adopted to publish all column counts. Prior studies have verified that truncating records or grouping columns can effectively improve the accuracy of published results. To leverage the advantages of the two techniques, we combine these studies to further boost the accuracy of published results. However, the traditional penalty function, which measures the error imported by a given pair of parameters including truncating length and group size, is so sensitive that the derived parameters deviate from the optimal parameters significantly. To output preferable parameters, we first design a smart penalty function that is less sensitive than the traditional function. Moreover, a two-phase selection method is proposed to compute these parameters efficiently, together with the improvement in accuracy. Extensive experiments on a broad spectrum of real-world datasets validate the effectiveness of our proposals. 展开更多
关键词 differential privacy high-dimensional data TRUNCATION optimization GROUPING PENALTY function
原文传递
On the k-sample Behrens-Fisher problem for high-dimensional data 被引量:3
14
作者 ZHANG JinTing XU JinFeng 《Science China Mathematics》 SCIE 2009年第6期1285-1304,共20页
For several decades,much attention has been paid to the two-sample Behrens-Fisher(BF) problem which tests the equality of the means or mean vectors of two normal populations with unequal variance/covariance structures... For several decades,much attention has been paid to the two-sample Behrens-Fisher(BF) problem which tests the equality of the means or mean vectors of two normal populations with unequal variance/covariance structures.Little work,however,has been done for the k-sample BF problem for high dimensional data which tests the equality of the mean vectors of several high-dimensional normal populations with unequal covariance structures.In this paper we study this challenging problem via extending the famous Scheffe's transformation method,which reduces the k-sample BF problem to a one-sample problem.The induced one-sample problem can be easily tested by the classical Hotelling's T 2 test when the size of the resulting sample is very large relative to its dimensionality.For high dimensional data,however,the dimensionality of the resulting sample is often very large,and even much larger than its sample size,which makes the classical Hotelling's T 2 test not powerful or not even well defined.To overcome this diffculty,we propose and study an L2-norm based test.The asymp-totic powers of the proposed L2-norm based test and Hotelling's T 2 test are derived and theoretically compared.Methods for implementing the L2-norm based test are described.Simulation studies are conducted to compare the L2-norm based test and Hotelling's T 2 test when the latter can be well defined,and to compare the proposed implementation methods for the L2-norm based test otherwise.The methodologies are motivated and illustrated by a real data example. 展开更多
关键词 χ~2-approximation χ~2-type MIXTURES high-dimensional data analysis Hotelling’s T^2 TEST k-sample TEST L^2-norm based TEST
原文传递
基于re3data的中英科学数据仓储平台对比研究
15
作者 袁烨 陈媛媛 《数字图书馆论坛》 2024年第2期13-23,共11页
以re3data为数据获取源,选取中英两国406个科学数据仓储为研究对象,从分布特征、责任类型、仓储许可、技术标准及质量标准等5个方面、11个指标对两国科学数据仓储的建设情况进行对比分析,试图为我国数据仓储的可持续发展提出建议:广泛... 以re3data为数据获取源,选取中英两国406个科学数据仓储为研究对象,从分布特征、责任类型、仓储许可、技术标准及质量标准等5个方面、11个指标对两国科学数据仓储的建设情况进行对比分析,试图为我国数据仓储的可持续发展提出建议:广泛联结国内外异质机构,推进多学科领域的交流与合作,有效扩充仓储许可权限与类型,优化技术标准的应用现况,提高元数据使用的灵活性。 展开更多
关键词 科学数据 数据仓储平台 re3data 中国 英国
下载PDF
Data Secure Storage Mechanism for IIoT Based on Blockchain 被引量:1
16
作者 Jin Wang Guoshu Huang +2 位作者 R.Simon Sherratt Ding Huang Jia Ni 《Computers, Materials & Continua》 SCIE EI 2024年第3期4029-4048,共20页
With the development of Industry 4.0 and big data technology,the Industrial Internet of Things(IIoT)is hampered by inherent issues such as privacy,security,and fault tolerance,which pose certain challenges to the rapi... With the development of Industry 4.0 and big data technology,the Industrial Internet of Things(IIoT)is hampered by inherent issues such as privacy,security,and fault tolerance,which pose certain challenges to the rapid development of IIoT.Blockchain technology has immutability,decentralization,and autonomy,which can greatly improve the inherent defects of the IIoT.In the traditional blockchain,data is stored in a Merkle tree.As data continues to grow,the scale of proofs used to validate it grows,threatening the efficiency,security,and reliability of blockchain-based IIoT.Accordingly,this paper first analyzes the inefficiency of the traditional blockchain structure in verifying the integrity and correctness of data.To solve this problem,a new Vector Commitment(VC)structure,Partition Vector Commitment(PVC),is proposed by improving the traditional VC structure.Secondly,this paper uses PVC instead of the Merkle tree to store big data generated by IIoT.PVC can improve the efficiency of traditional VC in the process of commitment and opening.Finally,this paper uses PVC to build a blockchain-based IIoT data security storage mechanism and carries out a comparative analysis of experiments.This mechanism can greatly reduce communication loss and maximize the rational use of storage space,which is of great significance for maintaining the security and stability of blockchain-based IIoT. 展开更多
关键词 Blockchain IIoT data storage cryptographic commitment
下载PDF
Item-level Forecasting for E-commerce Demand with High-dimensional Data Using a Two-stage Feature Selection Algorithm 被引量:1
17
作者 Hongyan Dai Qin Xiao +2 位作者 Nina Yan Xun Xu Tingting Tong 《Journal of Systems Science and Systems Engineering》 SCIE EI CSCD 2022年第2期247-264,共18页
With the rapid development of information technology and fast growth of Internet users,e-commerce nowadays is facing complex business environment and accumulating large-volume and highdimensional data.This brings two ... With the rapid development of information technology and fast growth of Internet users,e-commerce nowadays is facing complex business environment and accumulating large-volume and highdimensional data.This brings two challenges for demand forecasting.First,e-merchants need to find appropriate approaches to leverage the large amount of data and extract forecast features to capture various factors affecting the demand.Second,they need to efficiently identify the most important features to improve the forecast accuracy and better understand the key drivers for demand changes.To solve these challenges,this study conducts a multi-dimensional feature engineering by constructing five feature categories including historical demand,price,page view,reviews,and competition for e-commerce demand forecasting on item-level.We then propose a two-stage random forest-based feature selection algorithm to effectively identify the important features from the high-dimensional feature set and avoid overfittlng.We test our proposed algorithm with a large-scale dataset from the largest e-commerce platform in China.The numerical results from 21,111 items and 109 million sales observations show that our proposed random forest-based forecasting framework with a two-stage feature selection algorithm delivers 11.58%,5.81%and 3.68%forecast accuracy improvement,compared with the Autoregressive Integrated Moving Average(ARIMA),Random Forecast,and Random Forecast with one-stage feature selection approach,respectively,which are widely used in literature and industry.This study provides a useful tool for the practitioners to forecast demands and sheds lights on the B2C e-commerce operations management. 展开更多
关键词 Forecasting E-COMMERCE high-dimensional feature feature selection
原文传递
Robust Latent Factor Analysis for Precise Representation of High-Dimensional and Sparse Data 被引量:2
18
作者 Di Wu Xin Luo 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2021年第4期796-805,共10页
High-dimensional and sparse(HiDS)matrices commonly arise in various industrial applications,e.g.,recommender systems(RSs),social networks,and wireless sensor networks.Since they contain rich information,how to accurat... High-dimensional and sparse(HiDS)matrices commonly arise in various industrial applications,e.g.,recommender systems(RSs),social networks,and wireless sensor networks.Since they contain rich information,how to accurately represent them is of great significance.A latent factor(LF)model is one of the most popular and successful ways to address this issue.Current LF models mostly adopt L2-norm-oriented Loss to represent an HiDS matrix,i.e.,they sum the errors between observed data and predicted ones with L2-norm.Yet L2-norm is sensitive to outlier data.Unfortunately,outlier data usually exist in such matrices.For example,an HiDS matrix from RSs commonly contains many outlier ratings due to some heedless/malicious users.To address this issue,this work proposes a smooth L1-norm-oriented latent factor(SL-LF)model.Its main idea is to adopt smooth L1-norm rather than L2-norm to form its Loss,making it have both strong robustness and high accuracy in predicting the missing data of an HiDS matrix.Experimental results on eight HiDS matrices generated by industrial applications verify that the proposed SL-LF model not only is robust to the outlier data but also has significantly higher prediction accuracy than state-of-the-art models when they are used to predict the missing data of HiDS matrices. 展开更多
关键词 high-dimensional and sparse matrix L1-norm L2 norm latent factor model recommender system smooth L1-norm
下载PDF
K-Hyperparameter Tuning in High-Dimensional Space Clustering:Solving Smooth Elbow Challenges Using an Ensemble Based Technique of a Self-Adapting Autoencoder and Internal Validation Indexes
19
作者 Rufus Gikera Jonathan Mwaura +1 位作者 Elizaphan Muuro Shadrack Mambo 《Journal on Artificial Intelligence》 2023年第1期75-112,共38页
k-means is a popular clustering algorithm because of its simplicity and scalability to handle large datasets.However,one of its setbacks is the challenge of identifying the correct k-hyperparameter value.Tuning this v... k-means is a popular clustering algorithm because of its simplicity and scalability to handle large datasets.However,one of its setbacks is the challenge of identifying the correct k-hyperparameter value.Tuning this value correctly is critical for building effective k-means models.The use of the traditional elbow method to help identify this value has a long-standing literature.However,when using this method with certain datasets,smooth curves may appear,making it challenging to identify the k-value due to its unclear nature.On the other hand,various internal validation indexes,which are proposed as a solution to this issue,may be inconsistent.Although various techniques for solving smooth elbow challenges exist,k-hyperparameter tuning in high-dimensional spaces still remains intractable and an open research issue.In this paper,we have first reviewed the existing techniques for solving smooth elbow challenges.The identified research gaps are then utilized in the development of the new technique.The new technique,referred to as the ensemble-based technique of a self-adapting autoencoder and internal validation indexes,is then validated in high-dimensional space clustering.The optimal k-value,tuned by this technique using a voting scheme,is a trade-off between the number of clusters visualized in the autoencoder’s latent space,k-value from the ensemble internal validation index score and one that generates a value of 0 or close to 0 on the derivative f″′(k)(1+f′(k)^(2))−3 f″(k)^(2)f″((k)2f′(k),at the elbow.Experimental results based on the Cochran’s Q test,ANOVA,and McNemar’s score indicate a relatively good performance of the newly developed technique in k-hyperparameter tuning. 展开更多
关键词 k-hyperparameter tuning high-dimensional smooth elbow
下载PDF
An Optimal Big Data Analytics with Concept Drift Detection on High-Dimensional Streaming Data 被引量:1
20
作者 Romany F.Mansour Shaha Al-Otaibi +3 位作者 Amal Al-Rasheed Hanan Aljuaid Irina V.Pustokhina Denis A.Pustokhin 《Computers, Materials & Continua》 SCIE EI 2021年第9期2843-2858,共16页
Big data streams started becoming ubiquitous in recent years,thanks to rapid generation of massive volumes of data by different applications.It is challenging to apply existing data mining tools and techniques directl... Big data streams started becoming ubiquitous in recent years,thanks to rapid generation of massive volumes of data by different applications.It is challenging to apply existing data mining tools and techniques directly in these big data streams.At the same time,streaming data from several applications results in two major problems such as class imbalance and concept drift.The current research paper presents a new Multi-Objective Metaheuristic Optimization-based Big Data Analytics with Concept Drift Detection(MOMBD-CDD)method on High-Dimensional Streaming Data.The presented MOMBD-CDD model has different operational stages such as pre-processing,CDD,and classification.MOMBD-CDD model overcomes class imbalance problem by Synthetic Minority Over-sampling Technique(SMOTE).In order to determine the oversampling rates and neighboring point values of SMOTE,Glowworm Swarm Optimization(GSO)algorithm is employed.Besides,Statistical Test of Equal Proportions(STEPD),a CDD technique is also utilized.Finally,Bidirectional Long Short-Term Memory(Bi-LSTM)model is applied for classification.In order to improve classification performance and to compute the optimum parameters for Bi-LSTM model,GSO-based hyperparameter tuning process is carried out.The performance of the presented model was evaluated using high dimensional benchmark streaming datasets namely intrusion detection(NSL KDDCup)dataset and ECUE spam dataset.An extensive experimental validation process confirmed the effective outcome of MOMBD-CDD model.The proposed model attained high accuracy of 97.45%and 94.23%on the applied KDDCup99 Dataset and ECUE Spam datasets respectively. 展开更多
关键词 Streaming data concept drift classification model deep learning class imbalance data
下载PDF
上一页 1 2 250 下一页 到第
使用帮助 返回顶部