期刊文献+
共找到801篇文章
< 1 2 41 >
每页显示 20 50 100
Optimal Estimation of High-Dimensional Covariance Matrices with Missing and Noisy Data
1
作者 Meiyin Wang Wanzhou Ye 《Advances in Pure Mathematics》 2024年第4期214-227,共14页
The estimation of covariance matrices is very important in many fields, such as statistics. In real applications, data are frequently influenced by high dimensions and noise. However, most relevant studies are based o... The estimation of covariance matrices is very important in many fields, such as statistics. In real applications, data are frequently influenced by high dimensions and noise. However, most relevant studies are based on complete data. This paper studies the optimal estimation of high-dimensional covariance matrices based on missing and noisy sample under the norm. First, the model with sub-Gaussian additive noise is presented. The generalized sample covariance is then modified to define a hard thresholding estimator , and the minimax upper bound is derived. After that, the minimax lower bound is derived, and it is concluded that the estimator presented in this article is rate-optimal. Finally, numerical simulation analysis is performed. The result shows that for missing samples with sub-Gaussian noise, if the true covariance matrix is sparse, the hard thresholding estimator outperforms the traditional estimate method. 展开更多
关键词 high-dimensional Covariance Matrix Missing data Sub-Gaussian Noise Optimal Estimation
下载PDF
Subspace Clustering in High-Dimensional Data Streams:A Systematic Literature Review
2
作者 Nur Laila Ab Ghani Izzatdin Abdul Aziz Said Jadid AbdulKadir 《Computers, Materials & Continua》 SCIE EI 2023年第5期4649-4668,共20页
Clustering high dimensional data is challenging as data dimensionality increases the distance between data points,resulting in sparse regions that degrade clustering performance.Subspace clustering is a common approac... Clustering high dimensional data is challenging as data dimensionality increases the distance between data points,resulting in sparse regions that degrade clustering performance.Subspace clustering is a common approach for processing high-dimensional data by finding relevant features for each cluster in the data space.Subspace clustering methods extend traditional clustering to account for the constraints imposed by data streams.Data streams are not only high-dimensional,but also unbounded and evolving.This necessitates the development of subspace clustering algorithms that can handle high dimensionality and adapt to the unique characteristics of data streams.Although many articles have contributed to the literature review on data stream clustering,there is currently no specific review on subspace clustering algorithms in high-dimensional data streams.Therefore,this article aims to systematically review the existing literature on subspace clustering of data streams in high-dimensional streaming environments.The review follows a systematic methodological approach and includes 18 articles for the final analysis.The analysis focused on two research questions related to the general clustering process and dealing with the unbounded and evolving characteristics of data streams.The main findings relate to six elements:clustering process,cluster search,subspace search,synopsis structure,cluster maintenance,and evaluation measures.Most algorithms use a two-phase clustering approach consisting of an initialization stage,a refinement stage,a cluster maintenance stage,and a final clustering stage.The density-based top-down subspace clustering approach is more widely used than the others because it is able to distinguish true clusters and outliers using projected microclusters.Most algorithms implicitly adapt to the evolving nature of the data stream by using a time fading function that is sensitive to outliers.Future work can focus on the clustering framework,parameter optimization,subspace search techniques,memory-efficient synopsis structures,explicit cluster change detection,and intrinsic performance metrics.This article can serve as a guide for researchers interested in high-dimensional subspace clustering methods for data streams. 展开更多
关键词 CLUSTERING subspace clustering projected clustering data stream stream clustering high dimensionality evolving data stream concept drift
下载PDF
Experimental Investigation of a Fixed-geometry Two-dimensional Mixed-compression Supersonic Inlet with Sweep-forward High- light and Bleed Slot in an Inverted "X"-type Layout 被引量:9
3
作者 Wan Dawei Guo Rongwei 《Chinese Journal of Aeronautics》 SCIE EI CAS CSCD 2007年第4期304-312,共9页
A fixed-geometry two-dimensional mixed-compression supersonic inlet with sweep-forward high-light and bleed slot in an inverted "X"-form layout was tested in a wind tunnel. Results indicate: (1) with increases of... A fixed-geometry two-dimensional mixed-compression supersonic inlet with sweep-forward high-light and bleed slot in an inverted "X"-form layout was tested in a wind tunnel. Results indicate: (1) with increases of the free stream Mach number, the total pressure recovery decreases, while the mass flow ratio increases to the maximum at the design point and then decreases; (2) when the angle of attack, a, is less than 6°, the total pressure recovery of both side inlets tends to decrease, but, on the lee side inlet, its values are higher than those on the windward side inlet, and the mass flow ratio on lee side inlet increases first and then falls, while on the windward side it keeps declining slowly with the sum of mass flow on both sides remaining almost constant; (3) with the attack angle, a, rising from 6° to 9°, both total pressure recovery and mass flow ratio on the lee side inlet fall quickly, but on the windward side inlet can be observed decreases in the total pressure recovery and increases in the mass flow ratio; (4) by comparing the velocity and back pressure characterristics of the inlet with a bleed slot to those of the inlet without, it stands to reason that the existence of a bleed slot has not only widened the steady working range of inlet, but also made an enormous improvement in its performance at high Mach numbers. Besides, this paper also presents an example to show how this type of inlet is designed. 展开更多
关键词 aerospace propulsion system supersonic inlet two-dimensional mixed-compression experimental investigation bleed slot "X"-type sweep-forward high-light
下载PDF
CABOSFV algorithm for high dimensional sparse data clustering 被引量:7
4
作者 Sen Wu Xuedong Gao Management School, University of Science and Technology Beijing, Beijing 100083, China 《Journal of University of Science and Technology Beijing》 CSCD 2004年第3期283-288,共6页
An algorithm, Clustering Algorithm Based On Sparse Feature Vector (CABOSFV),was proposed for the high dimensional clustering of binary sparse data. This algorithm compressesthe data effectively by using a tool 'Sp... An algorithm, Clustering Algorithm Based On Sparse Feature Vector (CABOSFV),was proposed for the high dimensional clustering of binary sparse data. This algorithm compressesthe data effectively by using a tool 'Sparse Feature Vector', thus reduces the data scaleenormously, and can get the clustering result with only one data scan. Both theoretical analysis andempirical tests showed that CABOSFV is of low computational complexity. The algorithm findsclusters in high dimensional large datasets efficiently and handles noise effectively. 展开更多
关键词 CLUSTERING data mining SPARSE high dimensionality
下载PDF
Similarity measurement method of high-dimensional data based on normalized net lattice subspace 被引量:4
5
作者 李文法 Wang Gongming +1 位作者 Li Ke Huang Su 《High Technology Letters》 EI CAS 2017年第2期179-184,共6页
The performance of conventional similarity measurement methods is affected seriously by the curse of dimensionality of high-dimensional data.The reason is that data difference between sparse and noisy dimensionalities... The performance of conventional similarity measurement methods is affected seriously by the curse of dimensionality of high-dimensional data.The reason is that data difference between sparse and noisy dimensionalities occupies a large proportion of the similarity,leading to the dissimilarities between any results.A similarity measurement method of high-dimensional data based on normalized net lattice subspace is proposed.The data range of each dimension is divided into several intervals,and the components in different dimensions are mapped onto the corresponding interval.Only the component in the same or adjacent interval is used to calculate the similarity.To validate this method,three data types are used,and seven common similarity measurement methods are compared.The experimental result indicates that the relative difference of the method is increasing with the dimensionality and is approximately two or three orders of magnitude higher than the conventional method.In addition,the similarity range of this method in different dimensions is [0,1],which is fit for similarity analysis after dimensionality reduction. 展开更多
关键词 high-dimensional data the curse of dimensionality SIMILARITY NORMALIZATION SUBSPACE NPsim
下载PDF
Similarity measure design for high dimensional data 被引量:3
6
作者 LEE Sang-hyuk YAN Sun +1 位作者 JEONG Yoon-su SHIN Seung-soo 《Journal of Central South University》 SCIE EI CAS 2014年第9期3534-3540,共7页
Information analysis of high dimensional data was carried out through similarity measure application. High dimensional data were considered as the a typical structure. Additionally, overlapped and non-overlapped data ... Information analysis of high dimensional data was carried out through similarity measure application. High dimensional data were considered as the a typical structure. Additionally, overlapped and non-overlapped data were introduced, and similarity measure analysis was also illustrated and compared with conventional similarity measure. As a result, overlapped data comparison was possible to present similarity with conventional similarity measure. Non-overlapped data similarity analysis provided the clue to solve the similarity of high dimensional data. Considering high dimensional data analysis was designed with consideration of neighborhoods information. Conservative and strict solutions were proposed. Proposed similarity measure was applied to express financial fraud among multi dimensional datasets. In illustrative example, financial fraud similarity with respect to age, gender, qualification and job was presented. And with the proposed similarity measure, high dimensional personal data were calculated to evaluate how similar to the financial fraud. Calculation results show that the actual fraud has rather high similarity measure compared to the average, from minimal 0.0609 to maximal 0.1667. 展开更多
关键词 相似性度量 高维数据 设计 数据信息 计算结果 相似性分析 多维数据集 诈骗案
下载PDF
A nearest neighbor search algorithm of high-dimensional data based on sequential NPsim matrix
7
作者 李文法 Wang Gongming +1 位作者 Ma Nan Liu Hongzhe 《High Technology Letters》 EI CAS 2016年第3期241-247,共7页
Problems existin similarity measurement and index tree construction which affect the performance of nearest neighbor search of high-dimensional data. The equidistance problem is solved using NPsim function to calculat... Problems existin similarity measurement and index tree construction which affect the performance of nearest neighbor search of high-dimensional data. The equidistance problem is solved using NPsim function to calculate similarity. And a sequential NPsim matrix is built to improve indexing performance. To sum up the above innovations,a nearest neighbor search algorithm of high-dimensional data based on sequential NPsim matrix is proposed in comparison with the nearest neighbor search algorithms based on KD-tree or SR-tree on Munsell spectral data set. Experimental results show that the proposed algorithm similarity is better than that of other algorithms and searching speed is more than thousands times of others. In addition,the slow construction speed of sequential NPsim matrix can be increased by using parallel computing. 展开更多
关键词 搜索算法 高维数据 M矩阵 最近邻 序列 相似性解 搜索性能 搜索速度
下载PDF
Dimensionality Reduction of High-Dimensional Highly Correlated Multivariate Grapevine Dataset
8
作者 Uday Kant Jha Peter Bajorski +3 位作者 Ernest Fokoue Justine Vanden Heuvel Jan van Aardt Grant Anderson 《Open Journal of Statistics》 2017年第4期702-717,共16页
Viticulturists traditionally have a keen interest in studying the relationship between the biochemistry of grapevines’ leaves/petioles and their associated spectral reflectance in order to understand the fruit ripeni... Viticulturists traditionally have a keen interest in studying the relationship between the biochemistry of grapevines’ leaves/petioles and their associated spectral reflectance in order to understand the fruit ripening rate, water status, nutrient levels, and disease risk. In this paper, we implement imaging spectroscopy (hyperspectral) reflectance data, for the reflective 330 - 2510 nm wavelength region (986 total spectral bands), to assess vineyard nutrient status;this constitutes a high dimensional dataset with a covariance matrix that is ill-conditioned. The identification of the variables (wavelength bands) that contribute useful information for nutrient assessment and prediction, plays a pivotal role in multivariate statistical modeling. In recent years, researchers have successfully developed many continuous, nearly unbiased, sparse and accurate variable selection methods to overcome this problem. This paper compares four regularized and one functional regression methods: Elastic Net, Multi-Step Adaptive Elastic Net, Minimax Concave Penalty, iterative Sure Independence Screening, and Functional Data Analysis for wavelength variable selection. Thereafter, the predictive performance of these regularized sparse models is enhanced using the stepwise regression. This comparative study of regression methods using a high-dimensional and highly correlated grapevine hyperspectral dataset revealed that the performance of Elastic Net for variable selection yields the best predictive ability. 展开更多
关键词 high-dimensional data MULTI-STEP Adaptive Elastic Net MINIMAX CONCAVE Penalty Sure Independence Screening Functional data Analysis
下载PDF
CSFW-SC: Cuckoo Search Fuzzy-Weighting Algorithm for Subspace Clustering Applying to High-Dimensional Clustering 被引量:1
9
作者 WANG Jindong HE Jiajing +1 位作者 ZHANG Hengwei YU Zhiyong 《China Communications》 SCIE CSCD 2015年第S2期55-63,共9页
Aimed at the issue that traditional clustering methods are not appropriate to high-dimensional data, a cuckoo search fuzzy-weighting algorithm for subspace clustering is presented on the basis of the exited soft subsp... Aimed at the issue that traditional clustering methods are not appropriate to high-dimensional data, a cuckoo search fuzzy-weighting algorithm for subspace clustering is presented on the basis of the exited soft subspace clustering algorithm. In the proposed algorithm, a novel objective function is firstly designed by considering the fuzzy weighting within-cluster compactness and the between-cluster separation, and loosening the constraints of dimension weight matrix. Then gradual membership and improved Cuckoo search, a global search strategy, are introduced to optimize the objective function and search subspace clusters, giving novel learning rules for clustering. At last, the performance of the proposed algorithm on the clustering analysis of various low and high dimensional datasets is experimentally compared with that of several competitive subspace clustering algorithms. Experimental studies demonstrate that the proposed algorithm can obtain better performance than most of the existing soft subspace clustering algorithms. 展开更多
关键词 high-dimensional data CLUSTERING soft SUBSPACE CUCKOO SEARCH FUZZY CLUSTERING
下载PDF
Variance Estimation for High-Dimensional Varying Index Coefficient Models
10
作者 Miao Wang Hao Lv Yicun Wang 《Open Journal of Statistics》 2019年第5期555-570,共16页
This paper studies the re-adjusted cross-validation method and a semiparametric regression model called the varying index coefficient model. We use the profile spline modal estimator method to estimate the coefficient... This paper studies the re-adjusted cross-validation method and a semiparametric regression model called the varying index coefficient model. We use the profile spline modal estimator method to estimate the coefficients of the parameter part of the Varying Index Coefficient Model (VICM), while the unknown function part uses the B-spline to expand. Moreover, we combine the above two estimation methods under the assumption of high-dimensional data. The results of data simulation and empirical analysis show that for the varying index coefficient model, the re-adjusted cross-validation method is better in terms of accuracy and stability than traditional methods based on ordinary least squares. 展开更多
关键词 high-dimensional data Refitted Cross-Validation VARYING INDEX COEFFICIENT MODELS Variance ESTIMATION
下载PDF
Distribution/correlation-free test for two-sample means in high-dimensional functional data with eigenvalue decay relaxed
11
作者 Kaijie Xue 《Science China Mathematics》 SCIE CSCD 2023年第10期2337-2346,共10页
We propose a methodology for testing two-sample means in high-dimensional functional data that requires no decaying pattern on eigenvalues of the functional data.To the best of our knowledge,we are the first to consid... We propose a methodology for testing two-sample means in high-dimensional functional data that requires no decaying pattern on eigenvalues of the functional data.To the best of our knowledge,we are the first to consider and address such a problem.To be specific,we devise a confidence region for the mean curve difference between two samples,which directly establishes a rigorous inferential procedure based on the multiplier bootstrap.In addition,the proposed test permits the functional observations in each sample to have mutually different distributions and arbitrary correlation structures,which is regarded as the desired property of distribution/correlation-free,leading to a more challenging scenario for theoretical development.Other desired properties include the allowance for highly unequal sample sizes,exponentially growing data dimension in sample sizes and consistent power behavior under fairly general alternatives.The proposed test is shown uniformly convergent to the prescribed significance,and its finite sample performance is evaluated via the simulation study and an application to electroencephalography data. 展开更多
关键词 high dimension functional data eigenvalue decay relaxed multiplier bootstrap distribution/correlation-free
原文传递
纵向多分类数据的广义估计方程分析
12
作者 尹长明 代文昊 尹露阳 《应用数学》 北大核心 2024年第1期251-257,共7页
广义估计方程(GEE)是分析纵向数据的常用方法.如果响应变量的维数是一,XIE和YANG(2003)及WANG(2011)分别研究了协变量维数是固定的和协变量维数趋于无穷时,GEE估计的渐近性质.本文研究纵向多分类数据(multicategorical data)的GEE建模和... 广义估计方程(GEE)是分析纵向数据的常用方法.如果响应变量的维数是一,XIE和YANG(2003)及WANG(2011)分别研究了协变量维数是固定的和协变量维数趋于无穷时,GEE估计的渐近性质.本文研究纵向多分类数据(multicategorical data)的GEE建模和GEE估计的渐近性质.当数据的分类数大于二时,响应变量的维数大于一,所以推广了文献的相关结果. 展开更多
关键词 属性数据 纵向数据 广义估计方程 高维协变量
下载PDF
基于局部信息熵的计算机网络高维数据离群点检测系统
13
作者 谭印 苏雯洁 《现代电子技术》 北大核心 2024年第10期91-95,共5页
通过离群点检测可以及时发现计算机网络中的异常,从而为风险预警和控制提供重要线索。为此,设计一种基于局部信息熵的计算机网络高维数据离群点检测系统。在高维数据采集模块中,利用Wireshark工具采集计算机网络原始高维数据包;并在高... 通过离群点检测可以及时发现计算机网络中的异常,从而为风险预警和控制提供重要线索。为此,设计一种基于局部信息熵的计算机网络高维数据离群点检测系统。在高维数据采集模块中,利用Wireshark工具采集计算机网络原始高维数据包;并在高维数据存储模块中建立MySQL数据库、Zooleeper数据库与Redis数据库,用于存储采集的高维数据包。在高维数据离群点检测模块中,通过微聚类划分算法划分存储的高维数据包,得到数个微聚类;然后计算各微聚类的局部信息熵,确定各微聚类内是否存在离群点;再依据偏离度挖掘微聚类内的离群点;最后,利用高维数据可视化模块呈现离群点检测结果。实验证明:所设计系统不仅可以有效采集计算机网络高维数据并划分计算机网络高维数据,还能够有效检测高维数据离群点,且离群点检测效率较快。 展开更多
关键词 计算机网络 高维数据 离群点检测 局部信息熵 Wireshark工具 微聚类划分
下载PDF
本地差分隐私下的高维数据发布方法
14
作者 蔡梦男 沈国华 +1 位作者 黄志球 杨阳 《计算机科学》 CSCD 北大核心 2024年第2期322-332,共11页
从众多用户收集的高维数据可用性越来越高,庞大的高维数据涉及用户个人隐私,如何在使用高维数据的同时保护用户的隐私极具挑战性。文中主要关注本地差分隐私下的高维数据发布问题。现有的解决方案首先构建概率图模型,生成输入数据的一... 从众多用户收集的高维数据可用性越来越高,庞大的高维数据涉及用户个人隐私,如何在使用高维数据的同时保护用户的隐私极具挑战性。文中主要关注本地差分隐私下的高维数据发布问题。现有的解决方案首先构建概率图模型,生成输入数据的一组带噪声的低维边缘分布,然后使用它们近似输入数据集的联合分布以生成合成数据集。然而,现有方法在计算大量属性对的边缘分布构建概率图模型,以及计算概率图模型中规模较大的属性子集的联合分布时存在局限性。基于此,提出了一种本地差分隐私下的高维数据发布方法PrivHDP(High-dimensional Data Publication Under Local Differential Privacy)。首先,该方法使用随机采样响应代替传统的隐私预算分割策略扰动用户数据,提出自适应边缘分布计算方法计算成对属性的边缘分布构建Markov网。其次,使用新的方法代替互信息度量成对属性间的相关性,引入了基于高通滤波的阈值过滤技术缩减概率图构建过程的搜索空间,结合充分三角化操作和联合树算法获得一组属性子集。最后,基于联合分布分解和冗余消除,计算属性子集上的联合分布。在4个真实数据集上进行实验,结果表明,PrivHDP算法在k-way查询和SVM分类精度方面优于同类算法,验证了所提方法的可用性与高效性。 展开更多
关键词 本地差分隐私 高维数据 数据发布 边缘分布 联合分布
下载PDF
基于子空间的I-nice聚类算法
15
作者 何一帆 何玉林 +1 位作者 崔来中 黄哲学 《计算机科学》 CSCD 北大核心 2024年第6期153-160,共8页
高维数据的子空间聚类是无监督学习领域的热点研究问题,其难点在于寻找恰当的子空间以及其中的数据簇。大多数现有的子空间聚类算法均存在计算复杂度高和参数选择难的缺陷,这是因为在高维数据中子空间的组合数量很大,算法的执行时间非常... 高维数据的子空间聚类是无监督学习领域的热点研究问题,其难点在于寻找恰当的子空间以及其中的数据簇。大多数现有的子空间聚类算法均存在计算复杂度高和参数选择难的缺陷,这是因为在高维数据中子空间的组合数量很大,算法的执行时间非常长,且不同数据集和应用场景需要不同的参数设定。为此,提出了基于子空间的I-nice(简记为sub-I-nice)聚类算法用于识别高维数据中子空间内数据簇的个数。首先,该算法将原始数据维度随机划分成多个维度组,根据维度组生成子空间样本;接着,使用最新的I-niceMO算法对每个子空间数据进行聚类;最后,采用新设计的球模型对所有子空间的基聚类结果进行集成。在含有噪声的高维仿真数据集上对所提出的sub-I-nice算法进行了详细的性能验证,实验结果表明sub-I-nice算法相比其他3种代表性聚类算法有更好的准确性和鲁棒性,从而证实了其合理性和有效性。 展开更多
关键词 子空间聚类 I-nice聚类 高维数据 无监督学习 球模型
下载PDF
基于角度的图神经网络高维数据异常检测方法
16
作者 王俊 赖会霞 +1 位作者 万玥 张仕 《计算机工程》 CAS CSCD 北大核心 2024年第3期156-165,共10页
在高维数据空间中,数据大都处于高维空间边缘且分布十分稀疏,由此引起的“维度灾难”问题导致现有异常检测方法无法保证异常检测精度。为解决该问题,提出一种基于角度的图神经网络高维数据异常检测方法A-GNN。首先通过数据空间的均匀采... 在高维数据空间中,数据大都处于高维空间边缘且分布十分稀疏,由此引起的“维度灾难”问题导致现有异常检测方法无法保证异常检测精度。为解决该问题,提出一种基于角度的图神经网络高维数据异常检测方法A-GNN。首先通过数据空间的均匀采样和初始训练数据的扰动来扩充用于训练的数据;然后利用k近邻关系构造训练数据的k近邻关系图,并以k近邻元素距离加权角度的方差作为近邻关系图节点的初始异常因子;最后通过训练图神经网络模型,实现节点间的信息交互,使得相邻节点能够互相学习,从而进行有效的异常评估。在6个自然数据集上将A-GNN方法与9种典型异常检测方法进行实验对比,结果表明:A-GNN在5个数据集中取得了最高的AUC值,其能够大幅提升各种维度数据的异常检测精度,在一些“真高维数据”上异常检测的AUC值提升达40%以上;在不同k值下与3种基于k近邻的异常检测方法相比,A-GNN利用图神经网络节点间的信息交互能有效避免k值对检测结果的影响,方法具有更强的鲁棒性。 展开更多
关键词 异常检测 基于角度的异常评估 图神经网络 高维数据 K近邻
下载PDF
基于NMI-SC的糖尿病混合数据特征选择
17
作者 朱潘蕾 容芷君 +2 位作者 但斌斌 代超 吕生 《电子设计工程》 2024年第11期6-10,共5页
针对糖尿病预测精度受高维混合数据影响的问题,提出基于NMI-SC的糖尿病特征选择方法,通过邻域互信息(NMI)计算混合属性特征邻域半径内的联合概率密度,构建相似度矩阵,通过糖尿病特征之间的相似性构建无向图,基于谱聚类(SC)将糖尿病特征... 针对糖尿病预测精度受高维混合数据影响的问题,提出基于NMI-SC的糖尿病特征选择方法,通过邻域互信息(NMI)计算混合属性特征邻域半径内的联合概率密度,构建相似度矩阵,通过糖尿病特征之间的相似性构建无向图,基于谱聚类(SC)将糖尿病特征切分为多个特征相似组,实现非线性特征间的聚类,根据特征分类重要性选出相似组中的代表特征。并将其与原始特征集在支持向量机分类器上的准确率进行比较,该特征选择方法在删除46个冗余特征后,准确率提高了13.07%。实验结果表明,该方法能有效删除冗余特征,得到糖尿病分类性能优异的特征子集。 展开更多
关键词 特征选择 混合数据降维 邻域互信息 谱聚类
下载PDF
基于分布式多关联属性的高维数据差分隐私保护方法
18
作者 褚治广 李俊燕 +1 位作者 陈昊 张兴 《计算机工程与设计》 北大核心 2024年第4期967-973,共7页
针对高维数据发布的过程中存在由多关联属性引发的隐私信息泄露风险问题,在分布式环境下提出一种满足差分隐私保护的多关联属性高维数据发布方法(HDMPDP)。根据数据维度,提出一种基于分布式划分的粗糙集高效降维方法,完成对高维复杂数... 针对高维数据发布的过程中存在由多关联属性引发的隐私信息泄露风险问题,在分布式环境下提出一种满足差分隐私保护的多关联属性高维数据发布方法(HDMPDP)。根据数据维度,提出一种基于分布式划分的粗糙集高效降维方法,完成对高维复杂数据特征属性的划分,降低数据维度的同时提高处理效率;设计属性分类准则,利用属性信息熵改进关联分析方法;对得到的属性分别进行加噪,优化噪声添加的方式,减轻关联属性带来的隐私问题。在Spark分布式框架下实现隐私保护数据发布,通过高维数据实验验证了该方法的有效性和隐私保护的安全性。 展开更多
关键词 高维数据 多关联属性 差分隐私 分布式 关联分析 粗糙集 隐私保护
下载PDF
非参数可加模型的迭代自适应稳健变量选择
19
作者 朱能辉 尤进红 徐群芳 《应用概率统计》 CSCD 北大核心 2024年第2期201-228,共28页
本文结合稳健损失函数、B样条逼近和自适应组Lasso研究一个高维可加模型,以识别“大p小n”下的不显著协变量.与传统的最小二乘自适应组Lasso相比,该方法具有较好的抵消重尾误差和异常值的影响.为证明方便,本文进一步考虑了更一般的加权... 本文结合稳健损失函数、B样条逼近和自适应组Lasso研究一个高维可加模型,以识别“大p小n”下的不显著协变量.与传统的最小二乘自适应组Lasso相比,该方法具有较好的抵消重尾误差和异常值的影响.为证明方便,本文进一步考虑了更一般的加权稳健组Lasso估计,且该权向量对所建议的估计量具有模型选择oracle性质和渐近正态性的证明中起着关键作用.稳健组Lasso和自适应稳健组Lasso可以看作是加权稳健组Lasso在不同权向量下的特殊情况.在实际应用中,我们使用稳健组Lasso获得初始估计以降低问题的维数,然后使用迭代自适应稳健组Lasso选择非零分量.数值结果表明,所提出的方法对中等规模的样本具有良好的适用性.高维基因TRIM32数据验证了该方法的应用. 展开更多
关键词 自适应组Lasso 高维数据 非参数回归 oracle性质 稳健估计
下载PDF
面向高维流数据的离群值检测算法
20
作者 梁昌好 童英华 冯忠岭 《计算机工程与设计》 北大核心 2024年第5期1406-1412,共7页
累计局部离群因子(cumulative local outlier factor,C_LOF)算法能有效解决数据流中的概念漂移问题和克服离群点检测中的伪装问题,但在处理高维数据时,时间复杂度较高。为有效解决时间复杂度高的问题,提出一种基于投影索引近邻的累计局... 累计局部离群因子(cumulative local outlier factor,C_LOF)算法能有效解决数据流中的概念漂移问题和克服离群点检测中的伪装问题,但在处理高维数据时,时间复杂度较高。为有效解决时间复杂度高的问题,提出一种基于投影索引近邻的累计局部离群因子(cumulative local outlier factor based projection indexed nearest neighbor,PINN_C_LOF)算法。使用滑动窗口维护活跃数据点,在新数据到达和旧数据过期时,引入投影索引近邻(projection indexed nearest neighbor,PINN)方法,增量更新窗口中受影响数据点的近邻。实验结果表明,PINN_C_LOF算法在检测高维流数据离群值时,在保持检测精确度的前提下,其时间复杂度较C_LOF算法明显降低。 展开更多
关键词 高维流数据 离群值检测 累计局部离群因子 时间复杂度 投影索引近邻 局部离群因子 物联网
下载PDF
上一页 1 2 41 下一页 到第
使用帮助 返回顶部