期刊文献+
共找到11篇文章
< 1 >
每页显示 20 50 100
Similarity measure design for high dimensional data 被引量:3
1
作者 LEE Sang-hyuk YAN Sun +1 位作者 JEONG Yoon-su SHIN Seung-soo 《Journal of Central South University》 SCIE EI CAS 2014年第9期3534-3540,共7页
Information analysis of high dimensional data was carried out through similarity measure application. High dimensional data were considered as the a typical structure. Additionally, overlapped and non-overlapped data ... Information analysis of high dimensional data was carried out through similarity measure application. High dimensional data were considered as the a typical structure. Additionally, overlapped and non-overlapped data were introduced, and similarity measure analysis was also illustrated and compared with conventional similarity measure. As a result, overlapped data comparison was possible to present similarity with conventional similarity measure. Non-overlapped data similarity analysis provided the clue to solve the similarity of high dimensional data. Considering high dimensional data analysis was designed with consideration of neighborhoods information. Conservative and strict solutions were proposed. Proposed similarity measure was applied to express financial fraud among multi dimensional datasets. In illustrative example, financial fraud similarity with respect to age, gender, qualification and job was presented. And with the proposed similarity measure, high dimensional personal data were calculated to evaluate how similar to the financial fraud. Calculation results show that the actual fraud has rather high similarity measure compared to the average, from minimal 0.0609 to maximal 0.1667. 展开更多
关键词 high dimensional data similarity measure DIFFERENCE neighborhood information financial fraud
下载PDF
Global aerodynamic design optimization based on data dimensionality reduction 被引量:10
2
作者 Yasong QIU Junqiang BAI +1 位作者 Nan LIU Chen WANG 《Chinese Journal of Aeronautics》 SCIE EI CAS CSCD 2018年第4期643-659,共17页
In aerodynamic optimization, global optimization methods such as genetic algorithms are preferred in many cases because of their advantage on reaching global optimum. However,for complex problems in which large number... In aerodynamic optimization, global optimization methods such as genetic algorithms are preferred in many cases because of their advantage on reaching global optimum. However,for complex problems in which large number of design variables are needed, the computational cost becomes prohibitive, and thus original global optimization strategies are required. To address this need, data dimensionality reduction method is combined with global optimization methods, thus forming a new global optimization system, aiming to improve the efficiency of conventional global optimization. The new optimization system involves applying Proper Orthogonal Decomposition(POD) in dimensionality reduction of design space while maintaining the generality of original design space. Besides, an acceleration approach for samples calculation in surrogate modeling is applied to reduce the computational time while providing sufficient accuracy. The optimizations of a transonic airfoil RAE2822 and the transonic wing ONERA M6 are performed to demonstrate the effectiveness of the proposed new optimization system. In both cases, we manage to reduce the number of design variables from 20 to 10 and from 42 to 20 respectively. The new design optimization system converges faster and it takes 1/3 of the total time of traditional optimization to converge to a better design, thus significantly reducing the overall optimization time and improving the efficiency of conventional global design optimization method. 展开更多
关键词 Aerodynamic shape design optimization data dimensionality reduction Genetic algorithm Kriging surrogate model Proper orthogonal decomposition
原文传递
Data-driven surrogate model for aerodynamic design using separable shape tensor method
3
作者 Bo PANG Yang ZHANG +3 位作者 Junlin LI Xudong WANG Min CHANG Junqiang BAI 《Chinese Journal of Aeronautics》 SCIE EI CAS CSCD 2024年第9期41-58,共18页
In the context of increasing dimensionality of design variables and the complexity of constraints, the efficacy of Surrogate-Based Optimization(SBO) is limited. The traditional linear and nonlinear dimensionality redu... In the context of increasing dimensionality of design variables and the complexity of constraints, the efficacy of Surrogate-Based Optimization(SBO) is limited. The traditional linear and nonlinear dimensionality reduction algorithms are mainly to decompose the mathematical matrix composed of design variables or objective functions in various forms, the smoothness of the design space cannot be guaranteed in the process, and additional constraint functions need to be added in the optimization, which increases the calculation cost. This study presents a new parameterization method to improve both problems of SBO. The new parameterization is addressed by decoupling affine transformations(dilation, rotation, shearing, and translation) within the Grassmannian submanifold, which enables a separate representation of the physical information of the airfoil in a highdimensional space. Building upon this, Principal Geodesic Analysis(PGA) is employed to achieve geometric control, compress the design space, reduce the number of design variables, reduce the dimensions of design variables and enhance predictive performance during the surrogate optimization process. For comparison, a dimensionality reduction space is defined using 95% of the energy,and RAE 2822 for transonic conditions are used as demonstrations. This method significantly enhances the optimization efficiency of the surrogate model while effectively enabling geometric constraints. In three-dimensional problems, it enables simultaneous design of planar shapes for various components of the aircraft and high-order perturbation deformations. Optimization was applied to the ONERA M6 wing, achieving a lift-drag ratio of 18.09, representing a 27.25% improvement compared to the baseline configuration. In comparison to conventional surrogate model optimization methods, which only achieved a 17.97% improvement, this approach demonstrates its superiority. 展开更多
关键词 Aerodynamicdesign Grassmannian manifold Shape parameterization Surrogate-based optimiza-tion data dimensionality reduction
原文传递
A Comparative Study on Two Techniques of Reducing the Dimension of Text Feature Space
4
作者 Yin Zhonghang, Wang Yongcheng, Cai Wei & Diao Qian School of Electronic & Information Technology, Shanghai Jiaotong University, Shanghai 200030, P.R.China 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2002年第1期87-92,共6页
With the development of large scale text processing, the dimension of text feature space has become larger and larger, which has added a lot of difficulties to natural language processing. How to reduce the dimension... With the development of large scale text processing, the dimension of text feature space has become larger and larger, which has added a lot of difficulties to natural language processing. How to reduce the dimension has become a practical problem in the field. Here we present two clustering methods, i.e. concept association and concept abstract, to achieve the goal. The first refers to the keyword clustering based on the co occurrence of 展开更多
关键词 in the same text and the second refers to that in the same category. Then we compare the difference between them. Our experiment results show that they are efficient to reduce the dimension of text feature space. Keywords: Text data mining
下载PDF
Sure feature screening for high-dimensional dichotomous classification 被引量:2
5
作者 SHAO Li YU Yuan ZHOU Yong 《Science China Mathematics》 SCIE CSCD 2016年第12期2527-2542,共16页
The curse of high-dimensionality has emerged in the statistical fields more and more frequently.Many techniques have been developed to address this challenge for classification problems. We propose a novel feature scr... The curse of high-dimensionality has emerged in the statistical fields more and more frequently.Many techniques have been developed to address this challenge for classification problems. We propose a novel feature screening procedure for dichotomous response data. This new method can be implemented as easily as t-test marginal screening approach, and the proposed procedure is free of any subexponential tail probability conditions and moment requirement and not restricted in a specific model structure. We prove that our method possesses the sure screening property and also illustrate the effect of screening by Monte Carlo simulation and apply it to a real data example. 展开更多
关键词 ultra-high dimensional data dichotomous classification sure screening property
原文传递
Geo-Coordinated Parallel Coordinates (GCPC): Field trial studies of environmental data analysis
6
作者 Maha El Meseery Orland Hoeber 《Visual Informatics》 EI 2018年第2期111-124,共14页
The large number of environmental problems faced by society in recent years has driven researchers to collect and study massive amounts of data in order to understand the complex relations that exist between people an... The large number of environmental problems faced by society in recent years has driven researchers to collect and study massive amounts of data in order to understand the complex relations that exist between people and the environment in which we live.Such datasets are often high dimensional and heterogeneous in nature,with complex geospatial relations.Analysing such data can be challenging,especially when there is a need to maintain spatial awareness as the non-spatial attributes are studied.Geo-Coordinated Parallel Coordinates(GCPC)is a geovisual analytics approach designed to support exploration and analysis within complex geospatial environmental data.Parallel coordinates are tightly coupled with a geospatial representation and an investigative scatterplot,all of which can be used to show,reorganize,filter,and highlight the high dimensional,heterogeneous,and geospatial aspects of the data.Two sets of field trials were conducted with expert data analysts to validate the real-world benefits of the approach for studying environmental data.The results of these evaluations were positive,providing real-world evidence and new insights regarding the value of using GCPC to explore among environmental datasets when there is a need to remain aware of the geospatial aspects of the data as the non-spatial elements are studied. 展开更多
关键词 Geovisual analytics Heterogeneous data visualization High dimensional data visualization Field trial evaluations
原文传递
A spatial multi-scale integer coding method and its application to three-dimensional model organization
7
作者 Guangling Lai Xiaochong Tong +4 位作者 Yongsheng Zhang Lu Ding Yinling Sui Yi Lei Yong Zhang 《International Journal of Digital Earth》 SCIE 2020年第10期1151-1171,共21页
With the rapid development of digital earth,smart city,and digital twin technology,the demands of three-dimensional model data’s application is getting higher and higher.These data tend to be multi-objectification,mu... With the rapid development of digital earth,smart city,and digital twin technology,the demands of three-dimensional model data’s application is getting higher and higher.These data tend to be multi-objectification,multi-type,multi-scale,complex spatial relationship,and large amount,which brings great challenges to the efficient organization of them.This paper mainly studies the organization of three-dimensional model data,and the main contributions are as follows:1)A integer coding method of three dimensional multi-scale grid is proposed,which can reduce the four-dimensional(spatial dimension and scale dimension)space into one-dimensional,and has better space and scale clustering characteristics by comparing with various types of grid coding.2)The binary algebra calculation method is proposed to realize the basic spatial relationship calculation of three-dimensional grid,which has higher spatial relationship computing ability than 3D-Geohash method;3)The multi-scale integer coding method is applied to the data organization of three-dimensional city model,and the experiment results show that:it is more efficient and stable than the threedimensional R-tree index and Geohash coding method in the establishment of index and the query of three dimensional space. 展开更多
关键词 Regular grid division threedimensional spatial index multi-scale integer coding encoding calculations threedimensional building model data organization
原文传递
Comparison of dimension reduction methods for DEA under big data via Monte Carlo simulation
8
作者 Zikang Chen Song Han 《Journal of Management Science and Engineering》 2021年第4期363-376,共14页
Data with large dimensions will bring various problems to the application of data envelopment analysis(DEA).In this study,we focus on a“big data”problem related to the considerably large dimensions of the input-outp... Data with large dimensions will bring various problems to the application of data envelopment analysis(DEA).In this study,we focus on a“big data”problem related to the considerably large dimensions of the input-output data.The four most widely used approaches to guide dimension reduction in DEA are compared via Monte Carlo simulation,including principal component analysis(PCA-DEA),which is based on the idea of aggregating input and output,efficiency contribution measurement(ECM),average efficiency measure(AEC),and regression-based detection(RB),which is based on the idea of variable selection.We compare the performance of these methods under different scenarios and a brand-new comparison benchmark for the simulation test.In addition,we discuss the effect of initial variable selection in RB for the first time.Based on the results,we offer guidelines that are more reliable on how to choose an appropriate method. 展开更多
关键词 data envelopment analysis Big data data dimension reduction method
原文传递
Used car price prediction based on XGBoost and retention rate
9
作者 Shen Yutian Chen Jian +3 位作者 Dai Min Zhang Sirui Xu Jing Wang Qing 《The Journal of China Universities of Posts and Telecommunications》 EI CSCD 2024年第3期72-79,共8页
In order to improve the accuracy of used car price prediction,a machine learning prediction model based on the retention rate is proposed in this paper.Firstly,a random forest algorithm is used to filter the variables... In order to improve the accuracy of used car price prediction,a machine learning prediction model based on the retention rate is proposed in this paper.Firstly,a random forest algorithm is used to filter the variables in the data.Seven main characteristic variables that affect used car prices,such as new car price,service time,mileage and so on,are filtered out.Then,the linear regression classification method is introduced to classify the test data into high and low retention rate data.After that,the extreme gradient boosting(XGBoost)regression model is built for the two datasets respectively.The prediction results show that the comprehensive evaluation index of the proposed model is 0.548,which is significantly improved compared to 0.488 of the original XGBoost model.Finally,compared with other representative machine learning algorithms,this model shows certain advantages in terms of mean absolute percentage error(MAPE),5%accuracy rate and comprehensive evaluation index.As a result,the retention rate-based machine learning model established in this paper has significant advantages in terms of the accuracy of used car price prediction. 展开更多
关键词 random forest data dimensionality reduction extreme gradient boosting(XGBoost) retention rate price prediction
原文传递
Identifying the skeptics and the undecided through visual cluster analysis of local network geometry 被引量:1
10
作者 Shenghui Cheng Joachim Giesen +2 位作者 Tianyi Huang Philipp Lucas Klaus Mueller 《Visual Informatics》 EI 2022年第3期11-22,共12页
By skeptics and undecided we refer to nodes in clustered social networks that cannot be assigned easily to any of the clusters.Such nodes are typically found either at the interface between clusters(the undecided)or a... By skeptics and undecided we refer to nodes in clustered social networks that cannot be assigned easily to any of the clusters.Such nodes are typically found either at the interface between clusters(the undecided)or at their boundaries(the skeptics).Identifying these nodes is relevant in marketing applications like voter targeting,because the persons represented by such nodes are often more likely to be affected in marketing campaigns than nodes deeply within clusters.So far this identification task is not as well studied as other network analysis tasks like clustering,identifying central nodes,and detecting motifs.We approach this task by deriving novel geometric features from the network structure that naturally lend themselves to an interactive visual approach for identifying interface and boundary nodes. 展开更多
关键词 Graph/network data High dimensional data visualization Visualization in social and information sciences data clustering coordinated and multiple VIEWS
原文传递
Subspace clustering through attribute clustering
11
作者 Kun NIU Shubo ZHANG Junliang CHEN 《Frontiers of Electrical and Electronic Engineering in China》 CSCD 2008年第1期44-48,共5页
Many recently proposed subspace clustering methods suffer from two severe problems.First,the algorithms typically scale exponentially with the data dimensionality or the subspace dimensionality of clusters.Second,the ... Many recently proposed subspace clustering methods suffer from two severe problems.First,the algorithms typically scale exponentially with the data dimensionality or the subspace dimensionality of clusters.Second,the clustering results are often sensitive to input parameters.In this paper,a fast algorithm of subspace clustering using attribute clustering is proposed to overcome these limitations.This algorithm first filters out redundant attributes by computing the Gini coef-ficient.To evaluate the correlation of every two non-redundant attributes,the relation matrix of non-redund-ant attributes is constructed based on the relation function of two dimensional united Gini coefficients.After applying an overlapping clustering algorithm on the relation matrix,the candidate of all interesting subspaces is achieved.Finally,all subspace clusters can be derived by clustering on interesting subspaces.Experiments on both synthesis and real datasets show that the new algorithm not only achieves a significant gain of runtime and quality to find subspace clusters,but also is insensitive to input parameters. 展开更多
关键词 subspace clustering high dimensional data attribute clustering
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部