期刊文献+
共找到2,540篇文章
< 1 2 127 >
每页显示 20 50 100
Optimal Estimation of High-Dimensional Covariance Matrices with Missing and Noisy Data
1
作者 Meiyin Wang Wanzhou Ye 《Advances in Pure Mathematics》 2024年第4期214-227,共14页
The estimation of covariance matrices is very important in many fields, such as statistics. In real applications, data are frequently influenced by high dimensions and noise. However, most relevant studies are based o... The estimation of covariance matrices is very important in many fields, such as statistics. In real applications, data are frequently influenced by high dimensions and noise. However, most relevant studies are based on complete data. This paper studies the optimal estimation of high-dimensional covariance matrices based on missing and noisy sample under the norm. First, the model with sub-Gaussian additive noise is presented. The generalized sample covariance is then modified to define a hard thresholding estimator , and the minimax upper bound is derived. After that, the minimax lower bound is derived, and it is concluded that the estimator presented in this article is rate-optimal. Finally, numerical simulation analysis is performed. The result shows that for missing samples with sub-Gaussian noise, if the true covariance matrix is sparse, the hard thresholding estimator outperforms the traditional estimate method. 展开更多
关键词 high-dimensional Covariance Matrix Missing data Sub-Gaussian Noise Optimal Estimation
下载PDF
Enhancing Relational Triple Extraction in Specific Domains:Semantic Enhancement and Synergy of Large Language Models and Small Pre-Trained Language Models
2
作者 Jiakai Li Jianpeng Hu Geng Zhang 《Computers, Materials & Continua》 SCIE EI 2024年第5期2481-2503,共23页
In the process of constructing domain-specific knowledge graphs,the task of relational triple extraction plays a critical role in transforming unstructured text into structured information.Existing relational triple e... In the process of constructing domain-specific knowledge graphs,the task of relational triple extraction plays a critical role in transforming unstructured text into structured information.Existing relational triple extraction models facemultiple challenges when processing domain-specific data,including insufficient utilization of semantic interaction information between entities and relations,difficulties in handling challenging samples,and the scarcity of domain-specific datasets.To address these issues,our study introduces three innovative components:Relation semantic enhancement,data augmentation,and a voting strategy,all designed to significantly improve the model’s performance in tackling domain-specific relational triple extraction tasks.We first propose an innovative attention interaction module.This method significantly enhances the semantic interaction capabilities between entities and relations by integrating semantic information fromrelation labels.Second,we propose a voting strategy that effectively combines the strengths of large languagemodels(LLMs)and fine-tuned small pre-trained language models(SLMs)to reevaluate challenging samples,thereby improving the model’s adaptability in specific domains.Additionally,we explore the use of LLMs for data augmentation,aiming to generate domain-specific datasets to alleviate the scarcity of domain data.Experiments conducted on three domain-specific datasets demonstrate that our model outperforms existing comparative models in several aspects,with F1 scores exceeding the State of the Art models by 2%,1.6%,and 0.6%,respectively,validating the effectiveness and generalizability of our approach. 展开更多
关键词 Relational triple extraction semantic interaction large language models data augmentation specific domains
下载PDF
Security Vulnerability Analyses of Large Language Models (LLMs) through Extension of the Common Vulnerability Scoring System (CVSS) Framework
3
作者 Alicia Biju Vishnupriya Ramesh Vijay K. Madisetti 《Journal of Software Engineering and Applications》 2024年第5期340-358,共19页
Large Language Models (LLMs) have revolutionized Generative Artificial Intelligence (GenAI) tasks, becoming an integral part of various applications in society, including text generation, translation, summarization, a... Large Language Models (LLMs) have revolutionized Generative Artificial Intelligence (GenAI) tasks, becoming an integral part of various applications in society, including text generation, translation, summarization, and more. However, their widespread usage emphasizes the critical need to enhance their security posture to ensure the integrity and reliability of their outputs and minimize harmful effects. Prompt injections and training data poisoning attacks are two of the most prominent vulnerabilities in LLMs, which could potentially lead to unpredictable and undesirable behaviors, such as biased outputs, misinformation propagation, and even malicious content generation. The Common Vulnerability Scoring System (CVSS) framework provides a standardized approach to capturing the principal characteristics of vulnerabilities, facilitating a deeper understanding of their severity within the security and AI communities. By extending the current CVSS framework, we generate scores for these vulnerabilities such that organizations can prioritize mitigation efforts, allocate resources effectively, and implement targeted security measures to defend against potential risks. 展开更多
关键词 Common Vulnerability Scoring System (CVSS) large Language Models (LLMs) DALL-E Prompt Injections Training data Poisoning CVSS Metrics
下载PDF
Observation points classifier ensemble for high-dimensional imbalanced classification 被引量:1
4
作者 Yulin He Xu Li +3 位作者 Philippe Fournier‐Viger Joshua Zhexue Huang Mianjie Li Salman Salloum 《CAAI Transactions on Intelligence Technology》 SCIE EI 2023年第2期500-517,共18页
In this paper,an Observation Points Classifier Ensemble(OPCE)algorithm is proposed to deal with High-Dimensional Imbalanced Classification(HDIC)problems based on data processed using the Multi-Dimensional Scaling(MDS)... In this paper,an Observation Points Classifier Ensemble(OPCE)algorithm is proposed to deal with High-Dimensional Imbalanced Classification(HDIC)problems based on data processed using the Multi-Dimensional Scaling(MDS)feature extraction technique.First,dimensionality of the original imbalanced data is reduced using MDS so that distances between any two different samples are preserved as well as possible.Second,a novel OPCE algorithm is applied to classify imbalanced samples by placing optimised observation points in a low-dimensional data space.Third,optimization of the observation point mappings is carried out to obtain a reliable assessment of the unknown samples.Exhaustive experiments have been conducted to evaluate the feasibility,rationality,and effectiveness of the proposed OPCE algorithm using seven benchmark HDIC data sets.Experimental results show that(1)the OPCE algorithm can be trained faster on low-dimensional imbalanced data than on high-dimensional data;(2)the OPCE algorithm can correctly identify samples as the number of optimised observation points is increased;and(3)statistical analysis reveals that OPCE yields better HDIC performances on the selected data sets in comparison with eight other HDIC algorithms.This demonstrates that OPCE is a viable algorithm to deal with HDIC problems. 展开更多
关键词 classifier ensemble feature transformation high-dimensional data classification imbalanced learning observation point mechanism
下载PDF
A Length-Adaptive Non-Dominated Sorting Genetic Algorithm for Bi-Objective High-Dimensional Feature Selection
5
作者 Yanlu Gong Junhai Zhou +2 位作者 Quanwang Wu MengChu Zhou Junhao Wen 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2023年第9期1834-1844,共11页
As a crucial data preprocessing method in data mining,feature selection(FS)can be regarded as a bi-objective optimization problem that aims to maximize classification accuracy and minimize the number of selected featu... As a crucial data preprocessing method in data mining,feature selection(FS)can be regarded as a bi-objective optimization problem that aims to maximize classification accuracy and minimize the number of selected features.Evolutionary computing(EC)is promising for FS owing to its powerful search capability.However,in traditional EC-based methods,feature subsets are represented via a length-fixed individual encoding.It is ineffective for high-dimensional data,because it results in a huge search space and prohibitive training time.This work proposes a length-adaptive non-dominated sorting genetic algorithm(LA-NSGA)with a length-variable individual encoding and a length-adaptive evolution mechanism for bi-objective highdimensional FS.In LA-NSGA,an initialization method based on correlation and redundancy is devised to initialize individuals of diverse lengths,and a Pareto dominance-based length change operator is introduced to guide individuals to explore in promising search space adaptively.Moreover,a dominance-based local search method is employed for further improvement.The experimental results based on 12 high-dimensional gene datasets show that the Pareto front of feature subsets produced by LA-NSGA is superior to those of existing algorithms. 展开更多
关键词 Bi-objective optimization feature selection(FS) genetic algorithm high-dimensional data length-adaptive
下载PDF
Similarity measurement method of high-dimensional data based on normalized net lattice subspace 被引量:4
6
作者 李文法 Wang Gongming +1 位作者 Li Ke Huang Su 《High Technology Letters》 EI CAS 2017年第2期179-184,共6页
The performance of conventional similarity measurement methods is affected seriously by the curse of dimensionality of high-dimensional data.The reason is that data difference between sparse and noisy dimensionalities... The performance of conventional similarity measurement methods is affected seriously by the curse of dimensionality of high-dimensional data.The reason is that data difference between sparse and noisy dimensionalities occupies a large proportion of the similarity,leading to the dissimilarities between any results.A similarity measurement method of high-dimensional data based on normalized net lattice subspace is proposed.The data range of each dimension is divided into several intervals,and the components in different dimensions are mapped onto the corresponding interval.Only the component in the same or adjacent interval is used to calculate the similarity.To validate this method,three data types are used,and seven common similarity measurement methods are compared.The experimental result indicates that the relative difference of the method is increasing with the dimensionality and is approximately two or three orders of magnitude higher than the conventional method.In addition,the similarity range of this method in different dimensions is [0,1],which is fit for similarity analysis after dimensionality reduction. 展开更多
关键词 high-dimensional data the curse of dimensionality SIMILARITY NORMALIZATION SUBSPACE NPsim
下载PDF
Constructing Large Scale Cohort for Clinical Study on Heart Failure with Electronic Health Record in Regional Healthcare Platform:Challenges and Strategies in Data Reuse 被引量:2
7
作者 Daowen Liu Liqi Lei +1 位作者 Tong Ruan Ping He 《Chinese Medical Sciences Journal》 CAS CSCD 2019年第2期90-102,共13页
Regional healthcare platforms collect clinical data from hospitals in specific areas for the purpose of healthcare management.It is a common requirement to reuse the data for clinical research.However,we have to face ... Regional healthcare platforms collect clinical data from hospitals in specific areas for the purpose of healthcare management.It is a common requirement to reuse the data for clinical research.However,we have to face challenges like the inconsistence of terminology in electronic health records (EHR) and the complexities in data quality and data formats in regional healthcare platform.In this paper,we propose methodology and process on constructing large scale cohorts which forms the basis of causality and comparative effectiveness relationship in epidemiology.We firstly constructed a Chinese terminology knowledge graph to deal with the diversity of vocabularies on regional platform.Secondly,we built special disease case repositories (i.e.,heart failure repository) that utilize the graph to search the related patients and to normalize the data.Based on the requirements of the clinical research which aimed to explore the effectiveness of taking statin on 180-days readmission in patients with heart failure,we built a large-scale retrospective cohort with 29647 cases of heart failure patients from the heart failure repository.After the propensity score matching,the study group (n=6346) and the control group (n=6346) with parallel clinical characteristics were acquired.Logistic regression analysis showed that taking statins had a negative correlation with 180-days readmission in heart failure patients.This paper presents the workflow and application example of big data mining based on regional EHR data. 展开更多
关键词 electronic health RECORDS CLINICAL terminology knowledge graph CLINICAL special disease case REPOSITORY evaluation of data quality large scale COHORT study
下载PDF
Local and global approaches of affinity propagation clustering for large scale data 被引量:15
8
作者 Ding-yin XIA Fei WU Xu-qing ZHAN Yue-ting ZHUANG 《Journal of Zhejiang University-Science A(Applied Physics & Engineering)》 SCIE EI CAS CSCD 2008年第10期1373-1381,共9页
Recently a new clustering algorithm called 'affinity propagation' (AP) has been proposed, which efficiently clustered sparsely related data by passing messages between data points. However, we want to cluster ... Recently a new clustering algorithm called 'affinity propagation' (AP) has been proposed, which efficiently clustered sparsely related data by passing messages between data points. However, we want to cluster large scale data where the similarities are not sparse in many cases. This paper presents two variants of AP for grouping large scale data with a dense similarity matrix. The local approach is partition affinity propagation (PAP) and the global method is landmark affinity propagation (LAP). PAP passes messages in the subsets of data first and then merges them as the number of initial step of iterations; it can effectively reduce the number of iterations of clustering. LAP passes messages between the landmark data points first and then clusters non-landmark data points; it is a large global approximation method to speed up clustering. Experiments are conducted on many datasets, such as random data points, manifold subspaces, images of faces and Chinese calligraphy, and the results demonstrate that the two ap-proaches are feasible and practicable. 展开更多
关键词 聚类 大规模数据 传播方式 计算机技术
下载PDF
Scaling up Kernel Grower Clustering Method for Large Data Sets via Core-sets 被引量:2
9
作者 CHANG Liang DENG Xiao-Ming +1 位作者 ZHENG Sui-Wu WANG Yong-Qing 《自动化学报》 EI CSCD 北大核心 2008年第3期376-382,共7页
核栽培者是聚类最近 Camastra 和 Verri 建议的方法的一个新奇的核。它证明为各种各样的数据的好性能关于流行聚类的算法有利地设定并且比较。然而,方法的主要缺点是在处理大数据集合的弱可伸缩能力,它极大地限制它的应用程序。在这... 核栽培者是聚类最近 Camastra 和 Verri 建议的方法的一个新奇的核。它证明为各种各样的数据的好性能关于流行聚类的算法有利地设定并且比较。然而,方法的主要缺点是在处理大数据集合的弱可伸缩能力,它极大地限制它的应用程序。在这份报纸,我们用核心集合建议一个可伸缩起来的核栽培者方法,它是比为聚类的大数据的原来的方法显著地快的。同时,它能处理很大的数据集合。象合成数据集合一样的基准数据集合的数字实验显示出建议方法的效率。方法也被用于真实图象分割说明它的性能。 展开更多
关键词 大型数据集 图象分割 模式识别 磁心配置 核聚类
下载PDF
Large-time Behavior of Solutions for Parabolic Conservation Laws with Large Initial Data
10
作者 WANG Li-juan 《Chinese Quarterly Journal of Mathematics》 CSCD 2012年第2期232-237,共6页
In this paper,we study the large-time behavior of periodic solutions for parabolic conservation laws.There is no smallness assumption on the initial data.We firstly get the local existence of the solution by the itera... In this paper,we study the large-time behavior of periodic solutions for parabolic conservation laws.There is no smallness assumption on the initial data.We firstly get the local existence of the solution by the iterative scheme,then we get the exponential decay estimates for the solution by energy method and maximum principle,and obtain the global solution in the same time. 展开更多
关键词 parabolic conservation law periodic solution large initial data exponential decay
下载PDF
Dimensionality Reduction of High-Dimensional Highly Correlated Multivariate Grapevine Dataset
11
作者 Uday Kant Jha Peter Bajorski +3 位作者 Ernest Fokoue Justine Vanden Heuvel Jan van Aardt Grant Anderson 《Open Journal of Statistics》 2017年第4期702-717,共16页
Viticulturists traditionally have a keen interest in studying the relationship between the biochemistry of grapevines’ leaves/petioles and their associated spectral reflectance in order to understand the fruit ripeni... Viticulturists traditionally have a keen interest in studying the relationship between the biochemistry of grapevines’ leaves/petioles and their associated spectral reflectance in order to understand the fruit ripening rate, water status, nutrient levels, and disease risk. In this paper, we implement imaging spectroscopy (hyperspectral) reflectance data, for the reflective 330 - 2510 nm wavelength region (986 total spectral bands), to assess vineyard nutrient status;this constitutes a high dimensional dataset with a covariance matrix that is ill-conditioned. The identification of the variables (wavelength bands) that contribute useful information for nutrient assessment and prediction, plays a pivotal role in multivariate statistical modeling. In recent years, researchers have successfully developed many continuous, nearly unbiased, sparse and accurate variable selection methods to overcome this problem. This paper compares four regularized and one functional regression methods: Elastic Net, Multi-Step Adaptive Elastic Net, Minimax Concave Penalty, iterative Sure Independence Screening, and Functional Data Analysis for wavelength variable selection. Thereafter, the predictive performance of these regularized sparse models is enhanced using the stepwise regression. This comparative study of regression methods using a high-dimensional and highly correlated grapevine hyperspectral dataset revealed that the performance of Elastic Net for variable selection yields the best predictive ability. 展开更多
关键词 high-dimensional data MULTI-STEP Adaptive Elastic Net MINIMAX CONCAVE Penalty Sure Independence Screening Functional data Analysis
下载PDF
Sample-data Decentralized Reliable H∞ Hyperbolic Control for Uncertain Fuzzy Large-scale Systems with Time-varying Delay 被引量:2
12
作者 LIU Xin-Rui ZHANG Hua-Guang 《自动化学报》 EI CSCD 北大核心 2009年第12期1534-1540,共7页
这份报纸学习样品数据的问题为有变化时间的延期的不明确的连续时间的模糊大规模系统的可靠 H 夸张控制。第一,模糊夸张模型( FHM )被用来为某些复杂大规模系统建立模型,然后根据 Lyapunov 指导方法和大规模系统的分散的控制理论,线... 这份报纸学习样品数据的问题为有变化时间的延期的不明确的连续时间的模糊大规模系统的可靠 H 夸张控制。第一,模糊夸张模型( FHM )被用来为某些复杂大规模系统建立模型,然后根据 Lyapunov 指导方法和大规模系统的分散的控制理论,线性 matrixine 质量( LMI )基于条件 arederived toguarantee H 性能不仅当所有控制部件正在操作很好时,而且面对一些可能的致动器失败。而且,致动器的精确失败参数没被要求,并且要求仅仅是失败参数的更低、上面的界限。条件依赖于时间延期的上面的界限,并且不依赖于变化时间的延期的衍生物。因此,获得的结果是不太保守的。最后,二个例子被提供说明设计过程和它的有效性。 展开更多
关键词 模糊双曲模型 线性矩阵不等式 分散控制理论 执行器
下载PDF
Making Short-term High-dimensional Data Predictable
13
作者 CHEN Luonan 《Bulletin of the Chinese Academy of Sciences》 2018年第4期243-244,共2页
Making accurate forecast or prediction is a challenging task in the big data era, in particular for those datasets involving high-dimensional variables but short-term time series points,which are generally available f... Making accurate forecast or prediction is a challenging task in the big data era, in particular for those datasets involving high-dimensional variables but short-term time series points,which are generally available from real-world systems.To address this issue, Prof. 展开更多
关键词 RDE MAKING SHORT-TERM high-dimensional data Predictable
下载PDF
Interactive Generalization on Large-Scale Topographical Map Supported by a Database Platform
14
作者 CAIZhongliang WUHehai +1 位作者 DUQingyun LIAOChujiang 《Geo-Spatial Information Science》 2003年第4期17-26,共10页
This paper makes astudy on the interactive digital gener-alization, where map generalizationcan be divided into intellective reason-ing procedure and operational proce-dure, which are done by human andcomputer, respec... This paper makes astudy on the interactive digital gener-alization, where map generalizationcan be divided into intellective reason-ing procedure and operational proce-dure, which are done by human andcomputer, respectively. And an inter-active map generalization environmentfor large scale topographic map is thendesigned and realized. This researchfocuses on: ① the significance of re-searching an interactive map generali-zation environment, ② the features oflarge scale topographic map and inter-active map generalization, ③ the con-struction of map generalization-orien-ted database platform. 展开更多
关键词 地形图 数据库 数字 交互式 人工智能
下载PDF
Efficient Hierarchical Structure of Wavelet-Based Compression for Large Volume Data Sets
15
作者 柯永振 张加万 +1 位作者 孙济洲 李佳明 《Transactions of Tianjin University》 EI CAS 2006年第5期378-382,共5页
With volume size increasing, it is necessary to develop a highly efficient compression algorithm, which is suitable for progressive refinement between the data server and the browsing client. For three-dimensional lar... With volume size increasing, it is necessary to develop a highly efficient compression algorithm, which is suitable for progressive refinement between the data server and the browsing client. For three-dimensional large volume data, an efficient hierarchical algorithm based on wavelet compression was presented, using intra-band dependencies of wavelet coefficients. Firstly, after applying blockwise hierarchical wavelet decomposition to large volume data, the block significance map was obtained by using one bit to indicate significance or insignificance of the block. Secondly, the coefficient block was further subdivided into eight sub-blocks if any significant coefficient existed in it, and the process was repeated, resulting in an incomplete octree. One bit was used to indicate significance or insignificance, and only significant coefficients were stored in the data stream. Finally, the significant coefficients were quantified and compressed by arithmetic coding. The experimental results show that the proposed algorithm achieves good compression ratios and is suited for random access of data blocks. The results also show that the proposed algorithm can be applied to progressive transmission of 3D volume data. 展开更多
关键词 微波 压缩 海量数据 随机存取 八叉树
下载PDF
Large Data Based Research on the Editing of Film Trailer
16
作者 Yanqiu Tong Yang Song 《Journal of Sociology Study》 2015年第1期23-28,共6页
关键词 编辑技术 预告片 电影 时间码 大型数据 编辑软件 拖车 扰流板
下载PDF
Research on thermode of operation of wisdom Logistics in large data background
17
作者 GAO Lian-zhou 《International Journal of Technology Management》 2013年第12期6-8,共3页
关键词 物流活动 现代物流企业 热电极 快速反应能力 现代物流技术 综合物流服务 服务企业 运作
下载PDF
Mining Frequent Closed Itemsets in Large High Dimensional Data
18
作者 余光柱 曾宪辉 邵世煌 《Journal of Donghua University(English Edition)》 EI CAS 2008年第4期416-424,共9页
Large high-dimensional data have posed great challenges to existing algorithms for frequent itemsets mining.To solve the problem,a hybrid method,consisting of a novel row enumeration algorithm and a column enumeration... Large high-dimensional data have posed great challenges to existing algorithms for frequent itemsets mining.To solve the problem,a hybrid method,consisting of a novel row enumeration algorithm and a column enumeration algorithm,is proposed.The intention of the hybrid method is to decompose the mining task into two subtasks and then choose appropriate algorithms to solve them respectively.The novel algorithm,i.e.,Inter-transaction is based on the characteristic that there are few common items between or among long transactions.In addition,an optimization technique is adopted to improve the performance of the intersection of bit-vectors.Experiments on synthetic data show that our method achieves high performance in large high-dimensional data. 展开更多
关键词 频繁关闭系统 大空间数据 混合方法 计算机程序
下载PDF
Study on Tourism Electronic Platform based on Large Data Background
19
作者 Hengyuan Xie 《International Journal of Technology Management》 2014年第7期41-43,共3页
关键词 电子平台 旅游业 信息发布系统 J2EE平台 应用程序组件 变频技术 XML文档 数据存储
下载PDF
Understanding the Dynamics Location of Very Large Populations Interacted with Service Points
20
作者 Rola Younis Masoud Mohammed Mohammad Asif Salam 《Open Journal of Modelling and Simulation》 2023年第3期60-87,共28页
This paper offers preliminary work on system dynamics and Data mining tools. It tries to understand the dynamics of carrying out large-scale events, such as Hajj. The study looks at a large, recurring problem as a var... This paper offers preliminary work on system dynamics and Data mining tools. It tries to understand the dynamics of carrying out large-scale events, such as Hajj. The study looks at a large, recurring problem as a variable to consider, such as how the flow of people changes over time as well as how location interacts with placement. The predicted data is analyzed using Vensim PLE 32 modeling software, GIS Arc Map 10.2.1, and AnyLogic 7.3.1 software regarding the potential placement of temporal service points, taking into consideration the three dynamic constraints and behavioral aspects: a large population, limitation in time, and space. This research proposes appropriate data analyses to ensure the optimal positioning of the service points with limited time and space for large-scale events. The conceptual framework would be the output of this study. Knowledge may be added to the insights based on the technique. 展开更多
关键词 Information on Geographic Systems (GIS) large-Scale Events Hajj Pilgrimage data Mining Tools System Dynamics Agent-Based Modeling Discrete-Time Event
下载PDF
上一页 1 2 127 下一页 到第
使用帮助 返回顶部