期刊文献+
共找到777篇文章
< 1 2 39 >
每页显示 20 50 100
Distance function selection in several clustering algorithms
1
作者 LUYu 《Journal of Chongqing University》 CAS 2004年第1期47-50,共4页
Most clustering algorithms need to describe the similarity of objects by a predefined distance function. Three distance functions which are widely used in two traditional clustering algorithms k-means and hierarchical... Most clustering algorithms need to describe the similarity of objects by a predefined distance function. Three distance functions which are widely used in two traditional clustering algorithms k-means and hierarchical clustering were investigated. Both theoretical analysis and detailed experimental results were given. It is shown that a distance function greatly affects clustering results and can be used to detect the outlier of a cluster by the comparison of such different results and give the shape information of clusters. In practice situation, it is suggested to use different distance function separately, compare the clustering results and pick out the 搒wing points? And such points may leak out more information for data analysts. 展开更多
关键词 distance function clustering algorithms K-MEANS DENDROGRAM data mining
下载PDF
Outlier detection based on multi-dimensional clustering and local density
2
作者 SHOU Zhao-yu LI Meng-ya LI Si-min 《Journal of Central South University》 SCIE EI CAS CSCD 2017年第6期1299-1306,共8页
Outlier detection is an important task in data mining. In fact, it is difficult to find the clustering centers in some sophisticated multidimensional datasets and to measure the deviation degree of each potential outl... Outlier detection is an important task in data mining. In fact, it is difficult to find the clustering centers in some sophisticated multidimensional datasets and to measure the deviation degree of each potential outlier. In this work, an effective outlier detection method based on multi-dimensional clustering and local density(ODBMCLD) is proposed. ODBMCLD firstly identifies the center objects by the local density peak of data objects, and clusters the whole dataset based on the center objects. Then, outlier objects belonging to different clusters will be marked as candidates of abnormal data. Finally, the top N points among these abnormal candidates are chosen as final anomaly objects with high outlier factors. The feasibility and effectiveness of the method are verified by experiments. 展开更多
关键词 data mining outlier DETECTION outlier DETECTION method based on MULTI-DIMENSIONAL clustering and local density (ODBMCLD) algorithm deviation DEGREE
下载PDF
Combined data mining techniques based patient data outlier detection for healthcare safety 被引量:1
3
作者 Gebeyehu Belay Gebremeskel Chai Yi +1 位作者 Zhongshi He Dawit Haile 《International Journal of Intelligent Computing and Cybernetics》 EI 2016年第1期42-68,共27页
Purpose–Among the growing number of data mining(DM)techniques,outlier detection has gained importance in many applications and also attracted much attention in recent times.In the past,outlier detection researched pa... Purpose–Among the growing number of data mining(DM)techniques,outlier detection has gained importance in many applications and also attracted much attention in recent times.In the past,outlier detection researched papers appeared in a safety care that can view as searching for the needles in the haystack.However,outliers are not always erroneous.Therefore,the purpose of this paper is to investigate the role of outliers in healthcare services in general and patient safety care,in particular.Design/methodology/approach–It is a combined DM(clustering and the nearest neighbor)technique for outliers’detection,which provides a clear understanding and meaningful insights to visualize the data behaviors for healthcare safety.The outcomes or the knowledge implicit is vitally essential to a proper clinicaldecision-making process.The method isimportant to thesemantic,andthe novel tactic of patients’events and situations prove that play a significant role in the process of patient care safety and medications.Findings–The outcomes of the paper is discussing a novel and integrated methodology,which can be inferring for different biological data analysis.It is discussed as integrated DM techniques to optimize its performancein the field of health and medicalscience.It is an integrated method of outliers detection that can be extending for searching valuable information and knowledge implicit based on selected patient factors.Based on these facts,outliers are detected as clusters and point events,and novel ideas proposed to empower clinical services in consideration of customers’satisfactions.It is also essential to be a baseline for further healthcare strategic development and research works.Research limitations/implications–This paper mainly focussed on outliers detections.Outlier isolation that are essential to investigate the reason how it happened and communications how to mitigate it did not touch.Therefore,the research can be extended more about the hierarchy of patient problems.Originality/value–DM is a dynamic and successful gateway for discovering useful knowledge for enhancing healthcare performances and patient safety.Clinical data based outlier detection is a basic task to achieve healthcare strategy.Therefore,in this paper,the authors focussed on combined DM techniques for a deep analysis of clinical data,which provide an optimal level of clinical decision-making processes.Proper clinical decisions can obtain in terms of attributes selections that important to know the influential factors or parameters of healthcare services.Therefore,using integrated clustering and nearest neighbors techniques give more acceptable searched such complex data outliers,which could be fundamental to further analysis of healthcare and patient safety situational analysis. 展开更多
关键词 data mining clustering Healthcare mining algorithm Nearest neighbor outlier detection
原文传递
Study on the Grouping of Patients with Chronic Infectious Diseases Based on Data Mining
4
作者 Min Li 《Journal of Biosciences and Medicines》 2019年第11期119-135,共17页
Objective: According to RFM model theory of customer relationship management, data mining technology was used to group the chronic infectious disease patients to explore the effect of customer segmentation on the mana... Objective: According to RFM model theory of customer relationship management, data mining technology was used to group the chronic infectious disease patients to explore the effect of customer segmentation on the management of patients with different characteristics. Methods: 170,246 outpatient data was extracted from the hospital management information system (HIS) during January 2016 to July 2016, 43,448 data was formed after the data cleaning. K-Means clustering algorithm was used to classify patients with chronic infectious diseases, and then C5.0 decision tree algorithm was used to predict the situation of patients with chronic infectious diseases. Results: Male patients accounted for 58.7%, patients living in Shanghai accounted for 85.6%. The average age of patients is 45.88 years old, the high incidence age is 25 to 65 years old. Patients was gathered into three categories: 1) Clusters 1—Important patients (4786 people, 11.72%, R = 2.89, F = 11.72, M = 84,302.95);2) Clustering 2—Major patients (23,103, 53.2%, R = 5.22, F = 3.45, M = 9146.39);3) Cluster 3—Potential patients (15,559 people, 35.8%, R = 19.77, F = 1.55, M = 1739.09). C5.0 decision tree algorithm was used to predict the treatment situation of patients with chronic infectious diseases, the final treatment time (weeks) is an important predictor, the accuracy rate is 99.94% verified by the confusion model. Conclusion: Medical institutions should strengthen the adherence education for patients with chronic infectious diseases, establish the chronic infectious diseases and customer relationship management database, take the initiative to help them improve treatment adherence. Chinese governments at all levels should speed up the construction of hospital information, establish the chronic infectious disease database, strengthen the blocking of mother-to-child transmission, to effectively curb chronic infectious diseases, reduce disease burden and mortality. 展开更多
关键词 data mining K-Means clustering ALGORITHM C5.0 Decision Tree ALGORITHM Customer Relationship Management PATIENTS with CHRONIC INFECTIOUS Disease
下载PDF
Architecture of Integrated Data Clustering Machine
5
作者 ARIF Iqbal 《Computer Aided Drafting,Design and Manufacturing》 2009年第2期43-48,共6页
Data clustering is a significant information retrieval technique in today's data intensive society. Over the last few decades a vast variety of huge number of data clustering algorithms have been designed and impleme... Data clustering is a significant information retrieval technique in today's data intensive society. Over the last few decades a vast variety of huge number of data clustering algorithms have been designed and implemented for all most all data types. The quality of results of cluster analysis mainly depends on the clustering algorithm used in the analysis. Architecture of a versatile, less user dependent, dynamic and scalable data clustering machine is presented. The machine selects for analysis, the best available data clustering algorithm on the basis of the credentials of the data and previously used domain knowledge. The domain knowledge is updated on completion of each session of data analysis. 展开更多
关键词 data mining data clustering data clustering algorithms ARCHITECTURE FRAMEWORK
下载PDF
Scaling up the DBSCAN Algorithm for Clustering Large Spatial Databases Based on Sampling Technique 被引量:9
6
作者 Guan Ji hong 1, Zhou Shui geng 2, Bian Fu ling 3, He Yan xiang 1 1. School of Computer, Wuhan University, Wuhan 430072, China 2.State Key Laboratory of Software Engineering, Wuhan University, Wuhan 430072, China 3.College of Remote Sensin 《Wuhan University Journal of Natural Sciences》 CAS 2001年第Z1期467-473,共7页
Clustering, in data mining, is a useful technique for discovering interesting data distributions and patterns in the underlying data, and has many application fields, such as statistical data analysis, pattern recogni... Clustering, in data mining, is a useful technique for discovering interesting data distributions and patterns in the underlying data, and has many application fields, such as statistical data analysis, pattern recognition, image processing, and etc. We combine sampling technique with DBSCAN algorithm to cluster large spatial databases, and two sampling based DBSCAN (SDBSCAN) algorithms are developed. One algorithm introduces sampling technique inside DBSCAN, and the other uses sampling procedure outside DBSCAN. Experimental results demonstrate that our algorithms are effective and efficient in clustering large scale spatial databases. 展开更多
关键词 spatial databases data mining clustering sampling DBSCAN algorithm
下载PDF
Linear manifold clustering for high dimensional data based on line manifold searching and fusing 被引量:1
7
作者 黎刚果 王正志 +2 位作者 王晓敏 倪青山 强波 《Journal of Central South University》 SCIE EI CAS 2010年第5期1058-1069,共12页
High dimensional data clustering,with the inherent sparsity of data and the existence of noise,is a serious challenge for clustering algorithms.A new linear manifold clustering method was proposed to address this prob... High dimensional data clustering,with the inherent sparsity of data and the existence of noise,is a serious challenge for clustering algorithms.A new linear manifold clustering method was proposed to address this problem.The basic idea was to search the line manifold clusters hidden in datasets,and then fuse some of the line manifold clusters to construct higher dimensional manifold clusters.The orthogonal distance and the tangent distance were considered together as the linear manifold distance metrics. Spatial neighbor information was fully utilized to construct the original line manifold and optimize line manifolds during the line manifold cluster searching procedure.The results obtained from experiments over real and synthetic data sets demonstrate the superiority of the proposed method over some competing clustering methods in terms of accuracy and computation time.The proposed method is able to obtain high clustering accuracy for various data sets with different sizes,manifold dimensions and noise ratios,which confirms the anti-noise capability and high clustering accuracy of the proposed method for high dimensional data. 展开更多
关键词 linear manifold subspace clustering line manifold data mining data fusing clustering algorithm
下载PDF
A new clustering algorithm for large datasets 被引量:1
8
作者 李清峰 彭文峰 《Journal of Central South University》 SCIE EI CAS 2011年第3期823-829,共7页
The Circle algorithm was proposed for large datasets.The idea of the algorithm is to find a set of vertices that are close to each other and far from other vertices.This algorithm makes use of the connection between c... The Circle algorithm was proposed for large datasets.The idea of the algorithm is to find a set of vertices that are close to each other and far from other vertices.This algorithm makes use of the connection between clustering aggregation and the problem of correlation clustering.The best deterministic approximation algorithm was provided for the variation of the correlation of clustering problem,and showed how sampling can be used to scale the algorithms for large datasets.An extensive empirical evaluation was given for the usefulness of the problem and the solutions.The results show that this method achieves more than 50% reduction in the running time without sacrificing the quality of the clustering. 展开更多
关键词 data mining Circle algorithm clustering categorical data clustering aggregation
下载PDF
多级冗余强干扰下医用三维力传感器数据的自动挖掘方法
9
作者 岳根霞 王剑 刘金花 《传感技术学报》 CAS CSCD 北大核心 2024年第8期1383-1388,共6页
针对医用三维力传感器容易受电磁场等外部环境的影响,产生大量相似特征数据,导致其输出紊乱信号,降低传感器控制精度和测量速度的问题,提出一种多级冗余强干扰下三维力传感器数据挖掘方法。根据角度标定理论采集三维力传感器冗余数据;... 针对医用三维力传感器容易受电磁场等外部环境的影响,产生大量相似特征数据,导致其输出紊乱信号,降低传感器控制精度和测量速度的问题,提出一种多级冗余强干扰下三维力传感器数据挖掘方法。根据角度标定理论采集三维力传感器冗余数据;引入相似度指数函数计算冗余因子,获取三维力传感器冗余数据活跃度,完成数据冗余分类;通过差值去噪算法高性能过滤三维力传感器冗余数据;利用谱聚类算法构建拉普拉斯矩阵,剔除冗余数据,实现三维力传感器数据自动挖掘。仿真结果表明,所提方法在多级冗余强干扰下的三维力传感器控制精度为96.54%,测量速度为0.61 ms,能量消耗为0.26 kcal。由此证明,所提方法的控制精度高、测量速度快、传输效果优,能够满足机器人辅助手术过程中的力反馈控制需求。 展开更多
关键词 三维力传感器 冗余数据 数据挖掘 角度标定 指数函数 差值去噪 谱聚类算法
下载PDF
慢性心力衰竭气虚血瘀证潜在亚组人群特征探索研究
10
作者 杨帅 凌艺月 +4 位作者 贾志山 李小茜 何建成 姚磊 曹雪滨 《上海中医药杂志》 CSCD 2024年第11期21-27,共7页
目的探索慢性心力衰竭(CHF)气虚血瘀证潜在亚组人群,为心力衰竭的中医精准辨治提供客观依据。方法收集126例CHF气虚血瘀证患者的19种症状/体征指标和21种生化指标,利用R和Python软件进行K-mediods聚类算法分析,以轮廓系数确定最佳的聚... 目的探索慢性心力衰竭(CHF)气虚血瘀证潜在亚组人群,为心力衰竭的中医精准辨治提供客观依据。方法收集126例CHF气虚血瘀证患者的19种症状/体征指标和21种生化指标,利用R和Python软件进行K-mediods聚类算法分析,以轮廓系数确定最佳的聚类数目。组间比较,连续变量采用Kruskal-Wallis检验,分类变量采用Pearson's chi-squared检验或者Fisher's exact检验。结果CHF气虚血瘀证患者被聚类为3组,组间比较发现,气喘、不寐、颈脉怒张、纳差、畏寒5个症状/体征以及NYHA分级、氨基末端脑钠肽前体(NT-proBNP)、红细胞比容、尿素氮4个临床指标差异具有统计学意义(P<0.05)。第1组人群整体各症状/体征发生频率和临床指标严重程度相对较低;第2组人群以颈脉怒张发生频率较其他组高为特征,且促甲状腺激素、总胆固醇及低密度脂蛋白有偏高趋势,血小板有偏低的趋势;第3组人群以畏寒、纳差、不寐的发生率较其他组显著升高为特征,伴随尿素氮增高、红细胞比容降低。结论CHF气虚血瘀证潜在3种亚型,分别是气虚血瘀证本证型、气虚血瘀兼痰浊型、气虚血瘀兼阳虚型。 展开更多
关键词 慢性心力衰竭 气虚血瘀证 痰浊 阳虚 聚类算法 数据挖掘 辨证论治
下载PDF
城市休闲产业聚类模式APM算法模型开发与校验
11
作者 刘逸 吴雪涵 许汀汀 《旅游学刊》 CSSCI 北大核心 2024年第4期40-52,共13页
城市休闲相关产业的高质量发展对当前我国城市消费升级以及人居环境质量提升具有重要现实意义。但是,现有研究未能精准地捕捉海量广域分布的城市休闲产业的基本空间分布规律与结构,而已有的空间聚类算法较多适用于城市用地分析,未能很... 城市休闲相关产业的高质量发展对当前我国城市消费升级以及人居环境质量提升具有重要现实意义。但是,现有研究未能精准地捕捉海量广域分布的城市休闲产业的基本空间分布规律与结构,而已有的空间聚类算法较多适用于城市用地分析,未能很好地适用于离散分布的城市休闲产业研究。为此,文章基于空间兴趣点数据,开发距离通达值及空间集群中心点等算法,构建城市休闲旅游产业聚类模式空间算法模型(APM)。在以广州为例的研究中,APM模型捕捉出3170个以500 m步行生活圈为范围的城市休闲产业集群,校验了APM模型的科学性与应用价值。整体上,APM算法可以较好地捕捉城市休闲业态集群的空间结构,清晰识别城市休闲产业空间冷、热点分布的基本结构,由其捕捉行程的聚类边界与实际道路和建筑走向、水系边界、区域范围等重合度高,聚类集群符合实际情况,具备可信度与有效性。该研究是休闲产业集聚机制研究的一次方法创新,在算法精度、实际应用、可视化效率上均做出了创新性推进。与Fishnet方法相比,可以更科学精准地识别城市内部多个休闲消费商圈的边界,实现了高效率的城市休闲产业集群捕捉;与同位模型相比,可以呈现多类别的城市休闲业态结构,突破了现有研究只能捕捉两类业态组团的局限。 展开更多
关键词 城市旅游休闲 产业集聚模式 空间数据挖掘 聚类算法 POI 广州市
下载PDF
基于关联规则的局部离群数据挖掘算法设计
12
作者 王玲风 《佳木斯大学学报(自然科学版)》 CAS 2024年第6期18-21,共4页
针对现有挖掘算法在对局部离散数据挖掘时,存在挖掘结果关联度低、挖掘效率低的问题,引入关联规则,开展对局部离群数据挖掘算法设计研究。对需要挖掘的局部离散数据预处理,包括数据清洗、数据集成等。针对局部离散数据中的高维数据,提... 针对现有挖掘算法在对局部离散数据挖掘时,存在挖掘结果关联度低、挖掘效率低的问题,引入关联规则,开展对局部离群数据挖掘算法设计研究。对需要挖掘的局部离散数据预处理,包括数据清洗、数据集成等。针对局部离散数据中的高维数据,提出一种基于属性相关分析方法,实现聚类。确定挖掘算法中的离群因子与链距离。最后,结合关联规则,实现对局部离散数据的并行挖掘。通过对比实验证明,新的挖掘算法挖掘结果关联度更高,且挖掘效率高,具备极高应用价值。 展开更多
关键词 关联规则 离群 算法 挖掘 数据 局部
下载PDF
基于可穿戴式纳米生物传感器的人体运动数据挖掘算法
13
作者 马宪敏 崔元全 李放 《智能计算机与应用》 2024年第8期220-224,共5页
针对当前人体运动数据挖掘算法无法对实时数据进行采集与分析,导致人体运动数据挖掘正确率较低且时间较长的问题,提出基于可穿戴式纳米生物传感器的人体运动数据挖掘算法。首先,利用可穿戴式纳米生物传感器采集人体运动数据,将采集到的... 针对当前人体运动数据挖掘算法无法对实时数据进行采集与分析,导致人体运动数据挖掘正确率较低且时间较长的问题,提出基于可穿戴式纳米生物传感器的人体运动数据挖掘算法。首先,利用可穿戴式纳米生物传感器采集人体运动数据,将采集到的数据转换为二进制数据形式,并对转换后的数据进行清洗与补位处理;最后,使用萤火虫算法对K均值聚类方法进行优化,利用优化后的K均值聚类方法对清洗与补位后的数据进行聚类处理。实验结果表明,所提算法的召回率平均值为97.12%,数据挖掘正确率平均值为98.42%,为运动员生理指标的实时监测与分析提供重要的数据基础。 展开更多
关键词 纳米生物传感器 人体运动 数据挖掘 数据清洗 K均值聚类算法 数据采集
下载PDF
基于数据挖掘分析《中国百年百名中医临床家丛书》中急性黄疸型肝炎的证治规律
14
作者 陈敏 谢军 《中医临床研究》 2024年第5期63-68,共6页
目的:运用数据挖掘技术分析中国近代百年百名中医名家治疗急性黄疸型肝炎的用药规律。方法:收集《中国百年百名中医临床家丛书》(第1版)中治疗的急性黄疸型肝炎病案,筛选出符合纳入标准的处方,将纳入的处方上传到中医传承辅助平台V2.5,... 目的:运用数据挖掘技术分析中国近代百年百名中医名家治疗急性黄疸型肝炎的用药规律。方法:收集《中国百年百名中医临床家丛书》(第1版)中治疗的急性黄疸型肝炎病案,筛选出符合纳入标准的处方,将纳入的处方上传到中医传承辅助平台V2.5,建立纳入处方数据库,采用频数分析、聚类分析、关联规则等数据挖掘技术与方法对纳入处方进行分析。结果:对筛选的200首初诊、复诊处方进行分析,得出近现代百名中医大家治疗急性黄疸型肝炎的常用药物有茵陈、栀子、甘草、茯苓、郁金等,高频药物组合包括栀子-茵陈、郁金-茵陈、大黄-茵陈、茯苓-茵陈、泽泻-茵陈等,新处方包括竹茹-茯苓皮-石菖蒲-半夏-陈皮、白茅根-赤芍-黄连-桑白皮、神曲-白豆蔻-佛手-麦芽-玫瑰花、茵陈-大黄-栀子-当归、青黛-枸杞子-垂盆草-败酱草-姜黄等。结论:以中医传承辅助平台为基础,利用数据挖掘技术发现,近现代百名中医大家治疗急性黄疸型肝炎遵循利湿退黄的治疗法则,体现了从“化湿邪、利小便”来诊治急性黄疸型肝炎的学术思想,符合中医标本兼治的用药原则。 展开更多
关键词 急性黄疸型肝炎 黄疸 数据挖掘 关联规则 聚类算法
下载PDF
基于改进K-means聚类算法的网络异常数据挖掘与分类方法
15
作者 贺萌 《无线互联科技》 2024年第18期119-122,共4页
为了解决网络异常数据挖掘过程中漏报率、误报率较高的问题,文章提出一种基于改进K-means聚类算法的网络异常数据挖掘与分类方法。文章通过构建并行化频繁项集挖掘环境加速数据处理,利用局部离群点检测剔除异常值,同时引入K-means聚类... 为了解决网络异常数据挖掘过程中漏报率、误报率较高的问题,文章提出一种基于改进K-means聚类算法的网络异常数据挖掘与分类方法。文章通过构建并行化频繁项集挖掘环境加速数据处理,利用局部离群点检测剔除异常值,同时引入K-means聚类对数据的最大最小距离展开计算,融合隶属度函数与密度峰值优化算法,改进聚类初始中心选择及簇边界调整,从而提高异常识别准确性和分类效率。通过实验结果证明,该方法能够明显改善聚类效果与性能。 展开更多
关键词 K-MEANS聚类算法 网络异常 数据挖掘 数据分类 离群点检测
下载PDF
基于改进模糊聚类算法的大数据随机挖掘仿真 被引量:1
16
作者 李萍 刘金金 《计算机仿真》 2024年第2期496-499,521,共5页
大数据挖掘是从大量有噪声的、随机模糊的大数据中提取有价值信息的过程,由于海量大数据具有多维性、稀疏性以及动态性等特点,准确获取其分布特征的难度较大,随机挖掘难以直接实现。为此提出基于改进模糊聚类算法的大数据随机挖掘方法... 大数据挖掘是从大量有噪声的、随机模糊的大数据中提取有价值信息的过程,由于海量大数据具有多维性、稀疏性以及动态性等特点,准确获取其分布特征的难度较大,随机挖掘难以直接实现。为此提出基于改进模糊聚类算法的大数据随机挖掘方法。利用建立的语义概念树模型获取大数据的特征分布关系,并根据模糊语义分析法得出大数据的语义相似性、关联性条件,提取大数据特征。优先确定最佳聚类数,采用改进模糊聚类算法对其聚类,实现基于改进模糊算法的大数据随机挖掘。实验结果表明,上述方法的大数据模糊聚类效果较好,随机挖掘准确率可达到95%以上,实验所得结果验证了上述方法较强的应用有效性。 展开更多
关键词 改进模糊聚类算法 大数据随机挖掘 语义概念树 特征提取 特征聚类
下载PDF
基于网格结构的CLARANS改进算法 被引量:7
17
作者 张书春 孙秀英 《计算机工程》 CAS CSCD 2012年第6期56-59,共4页
为提高CLARANS算法的准确性和执行效率,利用网格聚类算法对数据空间进行划分的思想,结合统计信息网格算法,对算法初始节点和邻居节点的选择及替换总代价的计算进行改进。实验结果表明,与CLARANS算法相比,改进算法聚类结果的准确性和稳... 为提高CLARANS算法的准确性和执行效率,利用网格聚类算法对数据空间进行划分的思想,结合统计信息网格算法,对算法初始节点和邻居节点的选择及替换总代价的计算进行改进。实验结果表明,与CLARANS算法相比,改进算法聚类结果的准确性和稳定性更高,执行时间明显降低。 展开更多
关键词 clarans算法 统计信息网格算法 聚类 相异度 数据空间
下载PDF
基于融合改进K-means聚类算法的数据检测技术 被引量:3
18
作者 郭克难 《电子设计工程》 2024年第5期41-45,共5页
针对现有医疗财务数据分析系统平台老旧,采用传统K-means算法进行数据处理时性能较差的问题,文中设计了一种财务异常数据检测算法。对于传统K-means算法存在的分类效果不佳、运行效率偏低等不足,该算法结合密度峰值法对样本点的局部密... 针对现有医疗财务数据分析系统平台老旧,采用传统K-means算法进行数据处理时性能较差的问题,文中设计了一种财务异常数据检测算法。对于传统K-means算法存在的分类效果不佳、运行效率偏低等不足,该算法结合密度峰值法对样本点的局部密度和高密度距离进行计算,进而优化簇中心的选择。同时融合PCA降维算法减少了数据的冗余信息,进一步提高了运行效率。通过引入LOF离群检测算法对分簇后的数据进行检测,从而得到异常数据结果。实验测试中,所提算法在人工数据集上的平均ARI指标为0.844,真实数据集的准确率则达到了79.2%,在所有对比算法中均为最优,表明该算法具有良好的性能,可以对财务异常数据进行准确地检测。 展开更多
关键词 K-MEANS聚类 密度峰值检测 主成分分析法 离群检测算法 异常数据检测
下载PDF
基于网格结构的二次CLARANS聚类算法 被引量:2
19
作者 苏勇 黄烨 周冬 《计算机应用与软件》 CSCD 北大核心 2013年第3期287-290,共4页
针对CLARANS算法聚类效率低、聚类效果依赖初始节点等问题,提出一种基于网格的二次CLARANS算法(Twi-CLAR-ANS)。首先利用网格聚类算法划分数据空间,提取出密集网格中的所有数据对象,用CLARANS算法进行初次聚类,然后将第一次聚类得到的... 针对CLARANS算法聚类效率低、聚类效果依赖初始节点等问题,提出一种基于网格的二次CLARANS算法(Twi-CLAR-ANS)。首先利用网格聚类算法划分数据空间,提取出密集网格中的所有数据对象,用CLARANS算法进行初次聚类,然后将第一次聚类得到的局部最优解作为第二次聚类的初始参照点,对原始数据样本进行第二次聚类,最大程度上避免孤立点信息的丢失,防止聚类结果陷入局部最优。实验结果表明,与CLARANS算法相比,Twi-CLARANS算法具备更优的准确性和执行效率,并且保证了信息的完整性。 展开更多
关键词 clarans算法 聚类 网格 数据空间
下载PDF
Spark框架下支持差分隐私保护的K-means++聚类方法
20
作者 石江南 彭长根 谭伟杰 《信息安全研究》 CSCD 北大核心 2024年第8期712-718,共7页
针对差分隐私聚类算法在处理海量数据时其隐私性和可用性之间的矛盾,提出了一种分布式环境下支持差分隐私的K-means++聚类算法.该算法通过内存计算引擎Spark,创建弹性分布式数据集,利用转换算子及行动算子操作数据进行运算,并在选取初... 针对差分隐私聚类算法在处理海量数据时其隐私性和可用性之间的矛盾,提出了一种分布式环境下支持差分隐私的K-means++聚类算法.该算法通过内存计算引擎Spark,创建弹性分布式数据集,利用转换算子及行动算子操作数据进行运算,并在选取初始化中心点及迭代更新中心点的过程中,通过综合利用指数机制和拉普拉斯机制,以解决初始聚类中心敏感及隐私泄露问题,同时减少计算过程中对数据实施的扰动.根据差分隐私的特性,从理论角度对整个算法进行证明,以满足ε-差分隐私保护.实验结果证明了该方法在确保聚类结果可用性的前提下,具备出色的隐私保护能力和高效的运行效率. 展开更多
关键词 数据挖掘 聚类算法 差分隐私 Spark框架 指数机制
下载PDF
上一页 1 2 39 下一页 到第
使用帮助 返回顶部