期刊文献+
共找到1,080篇文章
< 1 2 54 >
每页显示 20 50 100
Optimal Estimation of High-Dimensional Covariance Matrices with Missing and Noisy Data
1
作者 Meiyin Wang Wanzhou Ye 《Advances in Pure Mathematics》 2024年第4期214-227,共14页
The estimation of covariance matrices is very important in many fields, such as statistics. In real applications, data are frequently influenced by high dimensions and noise. However, most relevant studies are based o... The estimation of covariance matrices is very important in many fields, such as statistics. In real applications, data are frequently influenced by high dimensions and noise. However, most relevant studies are based on complete data. This paper studies the optimal estimation of high-dimensional covariance matrices based on missing and noisy sample under the norm. First, the model with sub-Gaussian additive noise is presented. The generalized sample covariance is then modified to define a hard thresholding estimator , and the minimax upper bound is derived. After that, the minimax lower bound is derived, and it is concluded that the estimator presented in this article is rate-optimal. Finally, numerical simulation analysis is performed. The result shows that for missing samples with sub-Gaussian noise, if the true covariance matrix is sparse, the hard thresholding estimator outperforms the traditional estimate method. 展开更多
关键词 high-dimensional Covariance Matrix Missing data Sub-Gaussian Noise Optimal Estimation
下载PDF
Subspace Clustering in High-Dimensional Data Streams:A Systematic Literature Review
2
作者 Nur Laila Ab Ghani Izzatdin Abdul Aziz Said Jadid AbdulKadir 《Computers, Materials & Continua》 SCIE EI 2023年第5期4649-4668,共20页
Clustering high dimensional data is challenging as data dimensionality increases the distance between data points,resulting in sparse regions that degrade clustering performance.Subspace clustering is a common approac... Clustering high dimensional data is challenging as data dimensionality increases the distance between data points,resulting in sparse regions that degrade clustering performance.Subspace clustering is a common approach for processing high-dimensional data by finding relevant features for each cluster in the data space.Subspace clustering methods extend traditional clustering to account for the constraints imposed by data streams.Data streams are not only high-dimensional,but also unbounded and evolving.This necessitates the development of subspace clustering algorithms that can handle high dimensionality and adapt to the unique characteristics of data streams.Although many articles have contributed to the literature review on data stream clustering,there is currently no specific review on subspace clustering algorithms in high-dimensional data streams.Therefore,this article aims to systematically review the existing literature on subspace clustering of data streams in high-dimensional streaming environments.The review follows a systematic methodological approach and includes 18 articles for the final analysis.The analysis focused on two research questions related to the general clustering process and dealing with the unbounded and evolving characteristics of data streams.The main findings relate to six elements:clustering process,cluster search,subspace search,synopsis structure,cluster maintenance,and evaluation measures.Most algorithms use a two-phase clustering approach consisting of an initialization stage,a refinement stage,a cluster maintenance stage,and a final clustering stage.The density-based top-down subspace clustering approach is more widely used than the others because it is able to distinguish true clusters and outliers using projected microclusters.Most algorithms implicitly adapt to the evolving nature of the data stream by using a time fading function that is sensitive to outliers.Future work can focus on the clustering framework,parameter optimization,subspace search techniques,memory-efficient synopsis structures,explicit cluster change detection,and intrinsic performance metrics.This article can serve as a guide for researchers interested in high-dimensional subspace clustering methods for data streams. 展开更多
关键词 CLUSTERING subspace clustering projected clustering data stream stream clustering high dimensionality evolving data stream concept drift
下载PDF
Similarity measurement method of high-dimensional data based on normalized net lattice subspace 被引量:4
3
作者 李文法 Wang Gongming +1 位作者 Li Ke Huang Su 《High Technology Letters》 EI CAS 2017年第2期179-184,共6页
The performance of conventional similarity measurement methods is affected seriously by the curse of dimensionality of high-dimensional data.The reason is that data difference between sparse and noisy dimensionalities... The performance of conventional similarity measurement methods is affected seriously by the curse of dimensionality of high-dimensional data.The reason is that data difference between sparse and noisy dimensionalities occupies a large proportion of the similarity,leading to the dissimilarities between any results.A similarity measurement method of high-dimensional data based on normalized net lattice subspace is proposed.The data range of each dimension is divided into several intervals,and the components in different dimensions are mapped onto the corresponding interval.Only the component in the same or adjacent interval is used to calculate the similarity.To validate this method,three data types are used,and seven common similarity measurement methods are compared.The experimental result indicates that the relative difference of the method is increasing with the dimensionality and is approximately two or three orders of magnitude higher than the conventional method.In addition,the similarity range of this method in different dimensions is [0,1],which is fit for similarity analysis after dimensionality reduction. 展开更多
关键词 high-dimensional data the curse of dimensionality SIMILARITY NORMALIZATION SUBspace NPsim
下载PDF
CABOSFV algorithm for high dimensional sparse data clustering 被引量:7
4
作者 Sen Wu Xuedong Gao Management School, University of Science and Technology Beijing, Beijing 100083, China 《Journal of University of Science and Technology Beijing》 CSCD 2004年第3期283-288,共6页
An algorithm, Clustering Algorithm Based On Sparse Feature Vector (CABOSFV),was proposed for the high dimensional clustering of binary sparse data. This algorithm compressesthe data effectively by using a tool 'Sp... An algorithm, Clustering Algorithm Based On Sparse Feature Vector (CABOSFV),was proposed for the high dimensional clustering of binary sparse data. This algorithm compressesthe data effectively by using a tool 'Sparse Feature Vector', thus reduces the data scaleenormously, and can get the clustering result with only one data scan. Both theoretical analysis andempirical tests showed that CABOSFV is of low computational complexity. The algorithm findsclusters in high dimensional large datasets efficiently and handles noise effectively. 展开更多
关键词 CLUSTERING data mining SPARSE high dimensionality
下载PDF
Similarity measure design for high dimensional data 被引量:3
5
作者 LEE Sang-hyuk YAN Sun +1 位作者 JEONG Yoon-su SHIN Seung-soo 《Journal of Central South University》 SCIE EI CAS 2014年第9期3534-3540,共7页
Information analysis of high dimensional data was carried out through similarity measure application. High dimensional data were considered as the a typical structure. Additionally, overlapped and non-overlapped data ... Information analysis of high dimensional data was carried out through similarity measure application. High dimensional data were considered as the a typical structure. Additionally, overlapped and non-overlapped data were introduced, and similarity measure analysis was also illustrated and compared with conventional similarity measure. As a result, overlapped data comparison was possible to present similarity with conventional similarity measure. Non-overlapped data similarity analysis provided the clue to solve the similarity of high dimensional data. Considering high dimensional data analysis was designed with consideration of neighborhoods information. Conservative and strict solutions were proposed. Proposed similarity measure was applied to express financial fraud among multi dimensional datasets. In illustrative example, financial fraud similarity with respect to age, gender, qualification and job was presented. And with the proposed similarity measure, high dimensional personal data were calculated to evaluate how similar to the financial fraud. Calculation results show that the actual fraud has rather high similarity measure compared to the average, from minimal 0.0609 to maximal 0.1667. 展开更多
关键词 high dimensional data similarity measure DIFFERENCE neighborhood information financial fraud
下载PDF
CSFW-SC: Cuckoo Search Fuzzy-Weighting Algorithm for Subspace Clustering Applying to High-Dimensional Clustering 被引量:1
6
作者 WANG Jindong HE Jiajing +1 位作者 ZHANG Hengwei YU Zhiyong 《China Communications》 SCIE CSCD 2015年第S2期55-63,共9页
Aimed at the issue that traditional clustering methods are not appropriate to high-dimensional data, a cuckoo search fuzzy-weighting algorithm for subspace clustering is presented on the basis of the exited soft subsp... Aimed at the issue that traditional clustering methods are not appropriate to high-dimensional data, a cuckoo search fuzzy-weighting algorithm for subspace clustering is presented on the basis of the exited soft subspace clustering algorithm. In the proposed algorithm, a novel objective function is firstly designed by considering the fuzzy weighting within-cluster compactness and the between-cluster separation, and loosening the constraints of dimension weight matrix. Then gradual membership and improved Cuckoo search, a global search strategy, are introduced to optimize the objective function and search subspace clusters, giving novel learning rules for clustering. At last, the performance of the proposed algorithm on the clustering analysis of various low and high dimensional datasets is experimentally compared with that of several competitive subspace clustering algorithms. Experimental studies demonstrate that the proposed algorithm can obtain better performance than most of the existing soft subspace clustering algorithms. 展开更多
关键词 high-dimensional data CLUSTERING soft SUBspace CUCKOO SEARCH FUZZY CLUSTERING
下载PDF
Dimensionality Reduction of High-Dimensional Highly Correlated Multivariate Grapevine Dataset
7
作者 Uday Kant Jha Peter Bajorski +3 位作者 Ernest Fokoue Justine Vanden Heuvel Jan van Aardt Grant Anderson 《Open Journal of Statistics》 2017年第4期702-717,共16页
Viticulturists traditionally have a keen interest in studying the relationship between the biochemistry of grapevines’ leaves/petioles and their associated spectral reflectance in order to understand the fruit ripeni... Viticulturists traditionally have a keen interest in studying the relationship between the biochemistry of grapevines’ leaves/petioles and their associated spectral reflectance in order to understand the fruit ripening rate, water status, nutrient levels, and disease risk. In this paper, we implement imaging spectroscopy (hyperspectral) reflectance data, for the reflective 330 - 2510 nm wavelength region (986 total spectral bands), to assess vineyard nutrient status;this constitutes a high dimensional dataset with a covariance matrix that is ill-conditioned. The identification of the variables (wavelength bands) that contribute useful information for nutrient assessment and prediction, plays a pivotal role in multivariate statistical modeling. In recent years, researchers have successfully developed many continuous, nearly unbiased, sparse and accurate variable selection methods to overcome this problem. This paper compares four regularized and one functional regression methods: Elastic Net, Multi-Step Adaptive Elastic Net, Minimax Concave Penalty, iterative Sure Independence Screening, and Functional Data Analysis for wavelength variable selection. Thereafter, the predictive performance of these regularized sparse models is enhanced using the stepwise regression. This comparative study of regression methods using a high-dimensional and highly correlated grapevine hyperspectral dataset revealed that the performance of Elastic Net for variable selection yields the best predictive ability. 展开更多
关键词 high-dimensional data MULTI-STEP Adaptive Elastic Net MINIMAX CONCAVE Penalty Sure Independence Screening Functional data Analysis
下载PDF
Constructing Three-Dimension Space Graph for Outlier Detection Algorithms in Data Mining 被引量:1
8
作者 ZHANG Jing 1,2 , SUN Zhi-hui 1 1.Department of Computer Science and Engineering, Southeast University, Nanjing 210096, Jiangsu, China 2.Department of Electricity and Information Engineering, Jiangsu University, Zhenjiang 212001, Jiangsu, China 《Wuhan University Journal of Natural Sciences》 EI CAS 2004年第5期585-589,共5页
Outlier detection has very important applied value in data mining literature. Different outlier detection algorithms based on distinct theories have different definitions and mining processes. The three-dimensional sp... Outlier detection has very important applied value in data mining literature. Different outlier detection algorithms based on distinct theories have different definitions and mining processes. The three-dimensional space graph for constructing applied algorithms and an improved GridOf algorithm were proposed in terms of analyzing the existing outlier detection algorithms from criterion and theory. Key words outlier - detection - three-dimensional space graph - data mining CLC number TP 311. 13 - TP 391 Foundation item: Supported by the National Natural Science Foundation of China (70371015)Biography: ZHANG Jing (1975-), female, Ph. D, lecturer, research direction: data mining and knowledge discovery. 展开更多
关键词 OUTLIER DETECTION three-dimensional space graph data mining
下载PDF
Variance Estimation for High-Dimensional Varying Index Coefficient Models
9
作者 Miao Wang Hao Lv Yicun Wang 《Open Journal of Statistics》 2019年第5期555-570,共16页
This paper studies the re-adjusted cross-validation method and a semiparametric regression model called the varying index coefficient model. We use the profile spline modal estimator method to estimate the coefficient... This paper studies the re-adjusted cross-validation method and a semiparametric regression model called the varying index coefficient model. We use the profile spline modal estimator method to estimate the coefficients of the parameter part of the Varying Index Coefficient Model (VICM), while the unknown function part uses the B-spline to expand. Moreover, we combine the above two estimation methods under the assumption of high-dimensional data. The results of data simulation and empirical analysis show that for the varying index coefficient model, the re-adjusted cross-validation method is better in terms of accuracy and stability than traditional methods based on ordinary least squares. 展开更多
关键词 high-dimensional data Refitted Cross-Validation VARYING INDEX COEFFICIENT MODELS Variance ESTIMATION
下载PDF
New Clustering Method in High-Di mensional Space Based on Hypergraph-Models 被引量:1
10
作者 陈建斌 王淑静 宋瀚涛 《Journal of Beijing Institute of Technology》 EI CAS 2006年第2期156-161,共6页
To overcome the limitation of the traditional clustering algorithms which fail to produce meaningful clusters in high-dimensional, sparseness and binary value data sets, a new method based on hypergraph model is propo... To overcome the limitation of the traditional clustering algorithms which fail to produce meaningful clusters in high-dimensional, sparseness and binary value data sets, a new method based on hypergraph model is proposed. The hypergraph model maps the relationship present in the original data in high dimensional space into a hypergraph. A hyperedge represents the similarity of attrlbute-value distribution between two points. A hypergraph partitioning algorithm is used to find a partitioning of the vertices such that the corresponding data items in each partition are highly related and the weight of the hyperedges cut by the partitioning is minimized. The quality of the clustering result can be evaluated by applying the intra-cluster singularity value. Analysis and experimental results have demonstrated that this approach is applicable and effective in wide ranging scheme. 展开更多
关键词 high-dimensional clustering hypergraph model data mining
下载PDF
A Comparative Study on Two Techniques of Reducing the Dimension of Text Feature Space
11
作者 Yin Zhonghang, Wang Yongcheng, Cai Wei & Diao Qian School of Electronic & Information Technology, Shanghai Jiaotong University, Shanghai 200030, P.R.China 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2002年第1期87-92,共6页
With the development of large scale text processing, the dimension of text feature space has become larger and larger, which has added a lot of difficulties to natural language processing. How to reduce the dimension... With the development of large scale text processing, the dimension of text feature space has become larger and larger, which has added a lot of difficulties to natural language processing. How to reduce the dimension has become a practical problem in the field. Here we present two clustering methods, i.e. concept association and concept abstract, to achieve the goal. The first refers to the keyword clustering based on the co occurrence of 展开更多
关键词 in the same text and the second refers to that in the same category. Then we compare the difference between them. Our experiment results show that they are efficient to reduce the dimension of text feature space. Keywords: Text data mining
下载PDF
Distribution/correlation-free test for two-sample means in high-dimensional functional data with eigenvalue decay relaxed
12
作者 Kaijie Xue 《Science China Mathematics》 SCIE CSCD 2023年第10期2337-2346,共10页
We propose a methodology for testing two-sample means in high-dimensional functional data that requires no decaying pattern on eigenvalues of the functional data.To the best of our knowledge,we are the first to consid... We propose a methodology for testing two-sample means in high-dimensional functional data that requires no decaying pattern on eigenvalues of the functional data.To the best of our knowledge,we are the first to consider and address such a problem.To be specific,we devise a confidence region for the mean curve difference between two samples,which directly establishes a rigorous inferential procedure based on the multiplier bootstrap.In addition,the proposed test permits the functional observations in each sample to have mutually different distributions and arbitrary correlation structures,which is regarded as the desired property of distribution/correlation-free,leading to a more challenging scenario for theoretical development.Other desired properties include the allowance for highly unequal sample sizes,exponentially growing data dimension in sample sizes and consistent power behavior under fairly general alternatives.The proposed test is shown uniformly convergent to the prescribed significance,and its finite sample performance is evaluated via the simulation study and an application to electroencephalography data. 展开更多
关键词 high dimension functional data eigenvalue decay relaxed multiplier bootstrap distribution/correlation-free
原文传递
高新区转型下的国内创新城区研究进展与展望:基于CiteSpace文献计量软件
13
作者 叶小群 蒋竹林 《湖南城市学院学报(自然科学版)》 CAS 2023年第6期36-42,共7页
创新城区是未来城市创新化和综合化发展的重要空间形式,其因设施系统性、功能多样性和服务综合性而受到众多关注.高新区作为以高新技术产业开发为主要功能的区域,与综合性功能的创新城区还有很大差距,若将两者进行系统对比分析则有利于... 创新城区是未来城市创新化和综合化发展的重要空间形式,其因设施系统性、功能多样性和服务综合性而受到众多关注.高新区作为以高新技术产业开发为主要功能的区域,与综合性功能的创新城区还有很大差距,若将两者进行系统对比分析则有利于进一步了解国内创新城区发展基础并促进高新区向创新城区转型.本文以中国知网核心期刊作为文献来源,运用CiteSpace文献计量软件对国内2000—2021年高新区和创新城区领域文献的年份、作者、机构和关键词进行可视化分析和对比研究.结果显示,在发展演变、学界研究及研究热点上,高新区的研究较为系统丰富,创新城区的研究相对不足,且国内对二者的研究均存在一些不足.基于此,从研究脉络、研究方法、研究内容以及学科层面展开了评述,并从多尺度、分重点、全域性和多角度4个方面提出了对高新区转型和创新城区建设的研究展望. 展开更多
关键词 创新城区 高新区 CITEspace 创新空间 数据可视化
下载PDF
考虑电力网络约束的工业园区虚拟电厂调控边界求解方法
14
作者 廖思阳 贺聪 +3 位作者 李玲芳 徐箭 孙元章 柯德平 《电力系统自动化》 EI CSCD 北大核心 2024年第18期66-75,共10页
构建以新能源为主体的新型电力系统,亟须挖掘负荷侧灵活调节资源参与电网调控。含电解铝、矿热炉等高耗能负荷的工业园区具备良好的调控潜力,但是考虑园区内部电力网络约束,其精确调控边界求解面临着变量维数高、约束非线性的难题,现有... 构建以新能源为主体的新型电力系统,亟须挖掘负荷侧灵活调节资源参与电网调控。含电解铝、矿热炉等高耗能负荷的工业园区具备良好的调控潜力,但是考虑园区内部电力网络约束,其精确调控边界求解面临着变量维数高、约束非线性的难题,现有方法不能较好地兼顾计算效率和精度。对此,文中将上述问题抽象为高维非线性状态空间在P-Q耦合平面的投影问题:分别建立考虑工业园区安全运行线性化和非线性约束的调控边界投影求解模型,采用一种新颖的高维状态空间投影算法,通过顶点“搜索-映射”的两步式求解过程,得到工业园区型虚拟电厂调控边界的精确投影。算例结果表明,采用所提方法求解的调控边界可由线性不等式组完全表征,与现有调度系统完全兼容,结合与叠加柔性资源调控能力和传统采样方法的对比,验证了该方法的可行性以及高求解精度和效率。 展开更多
关键词 工业园区 虚拟电厂 调控边界 电力网络约束 高维状态空间投影
下载PDF
纵向多分类数据的广义估计方程分析
15
作者 尹长明 代文昊 尹露阳 《应用数学》 北大核心 2024年第1期251-257,共7页
广义估计方程(GEE)是分析纵向数据的常用方法.如果响应变量的维数是一,XIE和YANG(2003)及WANG(2011)分别研究了协变量维数是固定的和协变量维数趋于无穷时,GEE估计的渐近性质.本文研究纵向多分类数据(multicategorical data)的GEE建模和... 广义估计方程(GEE)是分析纵向数据的常用方法.如果响应变量的维数是一,XIE和YANG(2003)及WANG(2011)分别研究了协变量维数是固定的和协变量维数趋于无穷时,GEE估计的渐近性质.本文研究纵向多分类数据(multicategorical data)的GEE建模和GEE估计的渐近性质.当数据的分类数大于二时,响应变量的维数大于一,所以推广了文献的相关结果. 展开更多
关键词 属性数据 纵向数据 广义估计方程 高维协变量
下载PDF
面向高维数据发布的差分隐私算法及应用综述
16
作者 龙春 秦泽秀 +4 位作者 李丽莎 李婧 杨帆 魏金侠 付豫豪 《农业大数据学报》 2024年第2期170-184,共15页
随着大数据和机器学习技术的进一步发展,处理具有几十上百维特征的复杂结构和关系且蕴含丰富语义信息的高维数据成为一项挑战。在保障个人隐私不被泄露的前提下,如何安全地使用这些高维数据,成为当前的一个重要话题。我们查阅资料发现:... 随着大数据和机器学习技术的进一步发展,处理具有几十上百维特征的复杂结构和关系且蕴含丰富语义信息的高维数据成为一项挑战。在保障个人隐私不被泄露的前提下,如何安全地使用这些高维数据,成为当前的一个重要话题。我们查阅资料发现:关于差分隐私技术本身的综述很多,但是面向高维数据发布的差分隐私算法及应用的综述却很少。基于此,本文通过对差分隐私在高维数据领域的应用进行综述,深入了解不同方法在保护高维数据隐私方面的优劣,并指导面向高维数据发布的差分隐私算法未来研究的方向,从而更好地应对隐私保护和数据分析的挑战。本文首先介绍了差分隐私的原理和特性,总结了当前差分隐私技术本身的研究工作。然后从数据降维和数据合成两个角度分析了差分隐私在高维数据环境中的应用,探讨了差分隐私面临的问题和挑战,并提出了初步的解决方法,旨在更好地解决当前高维数据保护和使用的问题。最后,本文提出了未来可能的研究方向以促进技术交流,推动差分隐私在高维数据应用中的进一步突破。 展开更多
关键词 差分隐私 高维数据 扰动机制 隐私分配
下载PDF
非结构化高维大数据异常流量时间点挖掘算法
17
作者 解海燕 李杰 赵国栋 《计算机仿真》 2024年第7期474-478,共5页
非结构化数据的维度较高,每个样本数据包含的特征非常多,导致了维度灾难问题,使得降低维度并保持有效特征提取难度较大,影响大数据流量异常时间点挖掘的精度。为此,提出新的基于空间映射的非结构化高维大数据流量异常时间点挖掘方法。... 非结构化数据的维度较高,每个样本数据包含的特征非常多,导致了维度灾难问题,使得降低维度并保持有效特征提取难度较大,影响大数据流量异常时间点挖掘的精度。为此,提出新的基于空间映射的非结构化高维大数据流量异常时间点挖掘方法。通过近似解集的几何特征建立稀疏回归模型,求解高维目标空间映射到低维目标子空间的稀疏投影矩阵。根据密度分布选择出一个高密度集合作为聚类中心的候选集,确定聚类的初始聚类中心。同时对聚类形成的各个簇采用剪枝算法,选择时间点候选集,对候选集展开二次判断,挖掘高维大数据流量异常时间点。实验结果表明,数据的降维能有效提高流量异常挖掘精度。相比之下,所提方法的高维大数据流量异常时间点挖掘更加精准,耗时更短。 展开更多
关键词 非结构化数据 高维大数据 流量 异常时间点 挖掘方法
下载PDF
基于降维字典学习的高维数据分类策略
18
作者 李巧君 李江岱 王爱菊 《计算机应用与软件》 北大核心 2024年第9期329-338,共10页
为了解决字典学习中的高维数据与非线性问题,提出一种基于降维字典学习的高维数据分类策略。在降维阶段,利用自编码器学习一种非线性映射,该映射可以降维并保留高维数据的非线性结构;在字典学习阶段,利用标签嵌入进行局部约束;在学习过... 为了解决字典学习中的高维数据与非线性问题,提出一种基于降维字典学习的高维数据分类策略。在降维阶段,利用自编码器学习一种非线性映射,该映射可以降维并保留高维数据的非线性结构;在字典学习阶段,利用标签嵌入进行局部约束;在学习过程中,保留了可分解的非线性局部结构,增强了类的区分能力,同时优化了映射函数和字典。在多个基准数据集上的实验结果表明,提出的方法能够有效解决字典学习中的高维数据与非线性问题。 展开更多
关键词 字典学习 高维数据 局部约束 自编码器
下载PDF
基于局部信息熵的计算机网络高维数据离群点检测系统
19
作者 谭印 苏雯洁 《现代电子技术》 北大核心 2024年第10期91-95,共5页
通过离群点检测可以及时发现计算机网络中的异常,从而为风险预警和控制提供重要线索。为此,设计一种基于局部信息熵的计算机网络高维数据离群点检测系统。在高维数据采集模块中,利用Wireshark工具采集计算机网络原始高维数据包;并在高... 通过离群点检测可以及时发现计算机网络中的异常,从而为风险预警和控制提供重要线索。为此,设计一种基于局部信息熵的计算机网络高维数据离群点检测系统。在高维数据采集模块中,利用Wireshark工具采集计算机网络原始高维数据包;并在高维数据存储模块中建立MySQL数据库、Zooleeper数据库与Redis数据库,用于存储采集的高维数据包。在高维数据离群点检测模块中,通过微聚类划分算法划分存储的高维数据包,得到数个微聚类;然后计算各微聚类的局部信息熵,确定各微聚类内是否存在离群点;再依据偏离度挖掘微聚类内的离群点;最后,利用高维数据可视化模块呈现离群点检测结果。实验证明:所设计系统不仅可以有效采集计算机网络高维数据并划分计算机网络高维数据,还能够有效检测高维数据离群点,且离群点检测效率较快。 展开更多
关键词 计算机网络 高维数据 离群点检测 局部信息熵 Wireshark工具 微聚类划分
下载PDF
有限维空间下运动行为传感数据特征提取
20
作者 卢瑛 《信息技术》 2024年第7期115-120,共6页
维度的升高会加剧运动行为传感数据的复杂度,导致其分布特征空间被无限放大,因此提出基于有限维空间的运动行为传感数据特征提取方法。采用关联规则项挖掘分析方法计算数据模糊度,确定运动行为的有限空间区域。在有限维空间下,通过自适... 维度的升高会加剧运动行为传感数据的复杂度,导致其分布特征空间被无限放大,因此提出基于有限维空间的运动行为传感数据特征提取方法。采用关联规则项挖掘分析方法计算数据模糊度,确定运动行为的有限空间区域。在有限维空间下,通过自适应寻优方法,计算传感数据的特征量化参数。检测运动行为传感数据的特征属性,计算数据分布融合映射输出结果,构建运动行为特征提取模型。实验结果表明,所提方法的运动数据空间聚类效果较好,能够把数据固定在有限维空间,数据特征提取精度始终保持在95%以上。 展开更多
关键词 有限维空间 运动行为 传感数据 关联规则项挖掘 特征提取
下载PDF
上一页 1 2 54 下一页 到第
使用帮助 返回顶部