期刊文献+
共找到15篇文章
< 1 >
每页显示 20 50 100
传统与流数据聚类算法 被引量:2
1
作者 刘晓璐 王志栋 单广荣 《现代计算机》 2020年第29期25-28,共4页
在数据骤增的大数据时代,聚类算法成为研究热点。首先,介绍传统聚类算法与流数据聚类算法,流数据聚类算法可以达到数据快速扫描并分类形成簇集合的效果。此外,介绍基于划分的传统算法K-means、K-means++、K-中心点以及基于流数据的Strea... 在数据骤增的大数据时代,聚类算法成为研究热点。首先,介绍传统聚类算法与流数据聚类算法,流数据聚类算法可以达到数据快速扫描并分类形成簇集合的效果。此外,介绍基于划分的传统算法K-means、K-means++、K-中心点以及基于流数据的Stream算法;基于层次的传统算法BIRCH以及基于流数据的CluStream算法;基于密度的传统算法DNSCAN以及基于流数据的DenStream算法;基于网格的传统算法CLIQUE以及基于流数据的D-Stream算法。 展开更多
关键词 传统聚 数据 数据
下载PDF
基于流数据聚类算法的电力大数据异常检测 被引量:15
2
作者 于小青 齐林海 《电力信息与通信技术》 2020年第3期8-14,共7页
针对电力大数据流的异常检测问题,该文将流数据聚类算法与电力大数据相结合,针对现有流数据聚类算法不易存储全部数据、断电数据易丢失等问题,以及流数据聚类算法对于离线阶段聚类算法实时应答的要求,从数据的完整性、安全性以及流数据... 针对电力大数据流的异常检测问题,该文将流数据聚类算法与电力大数据相结合,针对现有流数据聚类算法不易存储全部数据、断电数据易丢失等问题,以及流数据聚类算法对于离线阶段聚类算法实时应答的要求,从数据的完整性、安全性以及流数据聚类算法的低时间复杂度的角度出发,对CluStream流数据聚类算法进行改进,提出流式K-means聚类算法。对在线阶段,使用Redis集群进行流数据的缓冲,并设计节点时间衰减策略,增大心跳消息中有效消息所占比例;对离线阶段聚类算法进行优化,使用最佳距离法确定初始聚类中心,减少迭代次数;最后,使用所提出的流式K-means聚类算法进行用户用电异常行为检测,实验结果表明,该算法能够很好的发现用户用电异常行为。 展开更多
关键词 电力大数据 数据 式K-means聚 用户用电异常
下载PDF
基于网格和MST的混合属性流数据聚类算法
3
作者 俞智君 张凤斌 《电脑知识与技术》 2010年第7期5220-5222,共3页
现有的流数据聚类算法往往只能处理单一属性类型的流数据,或是不能发现任意形状的聚类。针对这个问题,该文提出一种混合属性流数据聚类算法GTMS,算法使用了网格及MST(最小生成树)技术,采用基于信息增益和几何相邻的方法来计算混合类... 现有的流数据聚类算法往往只能处理单一属性类型的流数据,或是不能发现任意形状的聚类。针对这个问题,该文提出一种混合属性流数据聚类算法GTMS,算法使用了网格及MST(最小生成树)技术,采用基于信息增益和几何相邻的方法来计算混合类型数据相似度。实验表明该算法能够有效地处理混合属性流数据。 展开更多
关键词 数据 混合属性 网格 最小生成树
下载PDF
基于Spark的云数据中心性能异常实时检测方法
4
作者 蔡斌雷 郭芹 《西安职业技术学院学报》 2016年第3期1-5,19,共6页
针对当前云计算环境下数据中心性能异常检测方法的实时性、可扩展性问题,提出一种云数据中心环境下基于Spark的性能异常实时检测方法Spark—ADOPD(Spark-based Anomaly Detection OverPerformance DataInRealtime).方法设计基于Spar... 针对当前云计算环境下数据中心性能异常检测方法的实时性、可扩展性问题,提出一种云数据中心环境下基于Spark的性能异常实时检测方法Spark—ADOPD(Spark-based Anomaly Detection OverPerformance DataInRealtime).方法设计基于Spark的分布式、可扩展流数据聚类算法对采集的云数据中心性能数据进行自动分类,建立性能异常预测模型;定义相似度函数,通过计算持续到达的性能数据与预测模型的相似度,挖掘性能异常行为,以动态调整资源分配.实验结果证明Spark-ADOPD具有较好的实时性和准确性. 展开更多
关键词 异常检测 数据 SPARK 资源调度 数据中心
下载PDF
可时间局部性感知的块I/O关联挖掘算法 被引量:2
5
作者 黄立锋 邓玉辉 《小型微型计算机系统》 CSCD 北大核心 2015年第5期990-995,共6页
块I/O之间的频繁关联性是存储系统中普遍存在的现象.这种数据块之间的频繁关联性,在改善存储系统的数据布局、优化访问数据的预取策略等方面具有重要意义.传统的频繁关联序列挖掘算法没有考虑数据的时间局部性,不能够有效地挖掘出块I/O... 块I/O之间的频繁关联性是存储系统中普遍存在的现象.这种数据块之间的频繁关联性,在改善存储系统的数据布局、优化访问数据的预取策略等方面具有重要意义.传统的频繁关联序列挖掘算法没有考虑数据的时间局部性,不能够有效地挖掘出块I/O之间的频繁关联性.本文提出了一种关联强化窗口下的可时间局部感知的apriori改进算法来挖掘块I/O之间的频繁关联序列.此外,本文还对支持度达不到阈值却又不容忽视的次频繁关联序列进行了挖掘,与频繁序列形成优势互补.实验中利用了三个真实的Trace对该算法进行评估.实验结果表明改进后的apriori算法更适合于挖掘块I/O数据流的频繁和次频繁关联序列.而且,该算法弥补了传统的频繁关联序列挖掘算法对具有时间敏感性的类流数据进行关联挖掘的缺陷.另外,相比较于apriori算法,该算法的时间效率更高. 展开更多
关键词 关联强化窗口 块I/O关联 频繁关联序列 次频繁关联序列 类流数据
下载PDF
Clustering algorithm for multiple data streams based on spectral component similarity 被引量:1
6
作者 邹凌君 陈崚 屠莉 《Journal of Southeast University(English Edition)》 EI CAS 2008年第3期264-266,共3页
A new algorithm for clustering multiple data streams is proposed.The algorithm can effectively cluster data streams which show similar behavior with some unknown time delays.The algorithm uses the autoregressive (AR... A new algorithm for clustering multiple data streams is proposed.The algorithm can effectively cluster data streams which show similar behavior with some unknown time delays.The algorithm uses the autoregressive (AR) modeling technique to measure correlations between data streams.It exploits estimated frequencies spectra to extract the essential features of streams.Each stream is represented as the sum of spectral components and the correlation is measured component-wise.Each spectral component is described by four parameters,namely,amplitude,phase,damping rate and frequency.The ε-lag-correlation between two spectral components is calculated.The algorithm uses such information as similarity measures in clustering data streams.Based on a sliding window model,the algorithm can continuously report the most recent clustering results and adjust the number of clusters.Experiments on real and synthetic streams show that the proposed clustering method has a higher speed and clustering quality than other similar methods. 展开更多
关键词 data streams CLUSTERING AR model spectral component
下载PDF
Data Flow&Transaction Mode Classification and An Explorative Estimation on Data Storage&Transaction Volume 被引量:4
7
作者 Cai Yuezhou Liu Yuexin 《China Economist》 2022年第6期78-112,共35页
The public has shown great interest in the data factor and data transactions,but the current attention is overly focused on personal behavioral data and transactions happening at Data Exchanges.To deliver a complete p... The public has shown great interest in the data factor and data transactions,but the current attention is overly focused on personal behavioral data and transactions happening at Data Exchanges.To deliver a complete picture of data flaw and transaction,this paper presents a systematic overview of the flow and transaction of personal,corporate and public data on the basis of data factor classification from various perspectives.By utilizing various sources of information,this paper estimates the volume of data generation&storage and the volume&trend of data market transactions for major economies in the world with the following findings:(i)Data classification is diverse due to a broad variety of applying scenarios,and data transaction and profit distribution are complex due to heterogenous entities,ownerships,information density and other attributes of different data types.(ii)Global data transaction has presented with the characteristics of productization,servitization and platform-based mode.(iii)For major economies,there is a commonly observed disequilibrium between data generation scale and storage scale,which is particularly striking for China.(i^v)The global data market is in a nascent stage of rapid development with a transaction volume of about 100 billion US dollars,and China s data market is even more underdeveloped and only accounts for some 10%of the world total.All sectors of the society should be flly aware of the diversity and complexity of data factor classification and data transactions,as well as the arduous and long-term nature of developing and improving relevant institutional systems.Adapting to such features,efforts should be made to improve data classification,enhance computing infrastructure development,foster professional data transaction and development institutions,and perfect the data governance system. 展开更多
关键词 Data factor data classification data transaction mode data generation&storage volume data transaction volume
下载PDF
THRFuzzy:Tangential holoentropy-enabled rough fuzzy classifier to classification of evolving data streams 被引量:1
8
作者 Jagannath E.Nalavade T.Senthil Murugan 《Journal of Central South University》 SCIE EI CAS CSCD 2017年第8期1789-1800,共12页
The rapid developments in the fields of telecommunication, sensor data, financial applications, analyzing of data streams, and so on, increase the rate of data arrival, among which the data mining technique is conside... The rapid developments in the fields of telecommunication, sensor data, financial applications, analyzing of data streams, and so on, increase the rate of data arrival, among which the data mining technique is considered a vital process. The data analysis process consists of different tasks, among which the data stream classification approaches face more challenges than the other commonly used techniques. Even though the classification is a continuous process, it requires a design that can adapt the classification model so as to adjust the concept change or the boundary change between the classes. Hence, we design a novel fuzzy classifier known as THRFuzzy to classify new incoming data streams. Rough set theory along with tangential holoentropy function helps in the designing the dynamic classification model. The classification approach uses kernel fuzzy c-means(FCM) clustering for the generation of the rules and tangential holoentropy function to update the membership function. The performance of the proposed THRFuzzy method is verified using three datasets, namely skin segmentation, localization, and breast cancer datasets, and the evaluated metrics, accuracy and time, comparing its performance with HRFuzzy and adaptive k-NN classifiers. The experimental results conclude that THRFuzzy classifier shows better classification results providing a maximum accuracy consuming a minimal time than the existing classifiers. 展开更多
关键词 data stream classification fuzzy rough set tangential holoentropy concept change
下载PDF
Logistic Regression for Evolving Data Streams Classification
9
作者 尹志武 黄上腾 薛贵荣 《Journal of Shanghai Jiaotong university(Science)》 EI 2007年第2期197-203,共7页
Logistic regression is a fast classifier and can achieve higher accuracy on small training data.Moreover,it can work on both discrete and continuous attributes with nonlinear patterns.Based on these properties of logi... Logistic regression is a fast classifier and can achieve higher accuracy on small training data.Moreover,it can work on both discrete and continuous attributes with nonlinear patterns.Based on these properties of logistic regression,this paper proposed an algorithm,called evolutionary logistical regression classifier(ELRClass),to solve the classification of evolving data streams.This algorithm applies logistic regression repeatedly to a sliding window of samples in order to update the existing classifier,to keep this classifier if its performance is deteriorated by the reason of bursting noise,or to construct a new classifier if a major concept drift is detected.The intensive experimental results demonstrate the effectiveness of this algorithm. 展开更多
关键词 CLASSIFICATION logistic regression data stream mining
下载PDF
Influence of Geomorphology on Fish Fauna of a Small Mississippi Bluffline Stream
10
作者 Scott Stephen Knight Terry Douglas Welch 《Journal of Environmental Science and Engineering(B)》 2015年第4期169-176,共8页
Fish were collected from 39 sites on the main channel and major tributaries of a highly erosive stream, Hotophia Creek. A total of 2,642 specimens representing 38 species were collected between 1986 through 2003. The ... Fish were collected from 39 sites on the main channel and major tributaries of a highly erosive stream, Hotophia Creek. A total of 2,642 specimens representing 38 species were collected between 1986 through 2003. The bluntface shiner Cyprinella camura was the dominant species of fish and when grouped with other cyprinids accounted for 38.0% of the total numbers collected. By weight, Lepisosteusoculatus, Lepomismegalotis, lctiobusbubalus, and Lepomismacrochirus were the dominant species; accounting for 49.9% of the total catch. While more diminutive species such as cyprinids that might be subject to predation by large fish more frequently were found in shallow channels. Fishes with specific habitat requirement such as the pirate perch were found in the middle group of sites, that were disturbed by erosion process but that featured the necessary habitat requirements. Sensitive or intolerant species like the Yazoo darter, creek chubsucker and cyprinids in general were more frequently found in the undisturbed and habitat complex channels. This study supports the hypothesis that geomorphological stream stages are associated with specific communities of fishes. 展开更多
关键词 Stream classification GEOMORPHOLOGY index of biotic integrity ecology.
下载PDF
A TCAM-based Two-dimensional Prefix Packet Classification Algorithm
11
作者 王志恒 刘刚 白英彩 《Journal of Donghua University(English Edition)》 EI CAS 2004年第1期39-45,共7页
Packet classification (PC) has become the main method to support the quality of service and security of network application. And two-dimeusioual prefix packet classification (PPC) is the popular one. This paper analyz... Packet classification (PC) has become the main method to support the quality of service and security of network application. And two-dimeusioual prefix packet classification (PPC) is the popular one. This paper analyzes the problem of ruler conflict, and then presents a TCAM-based two-dimensional PPC algorithm. This algorithm makes use of the parallelism of TCAM to lookup the longest prefix in one instruction cycle. Then it uses a memory image and associated data structures to eliminate the conflicts between rulers, and performs a fast two-dimeusional PPC. Compared with other algorithms, this algorithm has the least time complexity and less space complexity. 展开更多
关键词 Ternary Content Addressable Memory (TCAM ) packet classification algorithm twodimensional prefix packet classification
下载PDF
Flow Label-Based TPv6 Packet Classification Algorithm with Dimension Reduction Capability
12
作者 黄小红 马严 《China Communications》 SCIE CSCD 2012年第5期1-9,共9页
Traditional packet classification for IPv4 involves examining standard 5-tuple of a packet header, source address, destination address, source port, destination port and protocol. With introduction of IPv6 flow label ... Traditional packet classification for IPv4 involves examining standard 5-tuple of a packet header, source address, destination address, source port, destination port and protocol. With introduction of IPv6 flow label field which entails labeling the packets belonging to the same flow, packet classification can be resolved based on 3 dimensions: flow label, source address and desti- nation address. In this paper, we propose a novel approach for the 3-tuple packet classification based on flow label. Besides, by introducing a conversion engine to covert the source-destination pairs to the compound address prefixes, we put forward an algorithm called Reducing Dimension (RD) with dimension reduction capability, which combines heuristic tree search with usage of buck- ets. And we also provide an improved version of RD, called Improved RD (IRD), which uses two mechanisms: path compression and priority tag, to optimize the perforrmnce. To evaluate our algo- rithm, extensive experiraents have been conducted using a number of synthetically generated databas- es. For the memory consumption, the two pro- posed new algorithms only consumes around 3% of the existing algorithms when the number of ill- ters increases to 10 k. And for the average search time, the search time of the two proposed algo- rithms is more than four times faster than others when the number of filters is 10 k. The results show that the proposed algorithm works well and outperforms rmny typical existing algorithms with the dimension reduction capability. 展开更多
关键词 IPV6 packet classification flow label
下载PDF
Linear manifold clustering for high dimensional data based on line manifold searching and fusing 被引量:1
13
作者 黎刚果 王正志 +2 位作者 王晓敏 倪青山 强波 《Journal of Central South University》 SCIE EI CAS 2010年第5期1058-1069,共12页
High dimensional data clustering,with the inherent sparsity of data and the existence of noise,is a serious challenge for clustering algorithms.A new linear manifold clustering method was proposed to address this prob... High dimensional data clustering,with the inherent sparsity of data and the existence of noise,is a serious challenge for clustering algorithms.A new linear manifold clustering method was proposed to address this problem.The basic idea was to search the line manifold clusters hidden in datasets,and then fuse some of the line manifold clusters to construct higher dimensional manifold clusters.The orthogonal distance and the tangent distance were considered together as the linear manifold distance metrics. Spatial neighbor information was fully utilized to construct the original line manifold and optimize line manifolds during the line manifold cluster searching procedure.The results obtained from experiments over real and synthetic data sets demonstrate the superiority of the proposed method over some competing clustering methods in terms of accuracy and computation time.The proposed method is able to obtain high clustering accuracy for various data sets with different sizes,manifold dimensions and noise ratios,which confirms the anti-noise capability and high clustering accuracy of the proposed method for high dimensional data. 展开更多
关键词 linear manifold subspace clustering line manifold data mining data fusing clustering algorithm
下载PDF
Building a Tree Adjusted Logistic Classification Model in Biomarker Data Analyses
14
作者 Dion Chen 《Journal of Mathematics and System Science》 2014年第6期433-438,共6页
Researchers in bioinformatics, biostatistics and other related fields seek biomarkers for many purposes, including risk assessment, disease diagnosis and prognosis, which can be formulated as a patient classification.... Researchers in bioinformatics, biostatistics and other related fields seek biomarkers for many purposes, including risk assessment, disease diagnosis and prognosis, which can be formulated as a patient classification. In this paper, a new method of using a tree regression to improve logistic classification model is introduced in biomarker data analysis. The numerical results show that the linear logistic model can be significantly improved by a tree regression on the residuals. Although the classification problem of binary responses is discussed in this research, the idea is easy to extend to the classification of multinomial responses. 展开更多
关键词 BIOINFORMATICS BIOMARKER tree regression logistic model CLASSIFICATION
下载PDF
Image feature optimization based on nonlinear dimensionality reduction 被引量:3
15
作者 Rong ZHU Min YAO 《Journal of Zhejiang University-Science A(Applied Physics & Engineering)》 SCIE EI CAS CSCD 2009年第12期1720-1737,共18页
Image feature optimization is an important means to deal with high-dimensional image data in image semantic understanding and its applications. We formulate image feature optimization as the establishment of a mapping... Image feature optimization is an important means to deal with high-dimensional image data in image semantic understanding and its applications. We formulate image feature optimization as the establishment of a mapping between highand low-dimensional space via a five-tuple model. Nonlinear dimensionality reduction based on manifold learning provides a feasible way for solving such a problem. We propose a novel globular neighborhood based locally linear embedding (GNLLE) algorithm using neighborhood update and an incremental neighbor search scheme, which not only can handle sparse datasets but also has strong anti-noise capability and good topological stability. Given that the distance measure adopted in nonlinear dimensionality reduction is usually based on pairwise similarity calculation, we also present a globular neighborhood and path clustering based locally linear embedding (GNPCLLE) algorithm based on path-based clustering. Due to its full consideration of correlations between image data, GNPCLLE can eliminate the distortion of the overall topological structure within the dataset on the manifold. Experimental results on two image sets show the effectiveness and efficiency of the proposed algorithms. 展开更多
关键词 Image feature optimization Nonlinear dimensionality reduction Manifold learning Locally linear embedding (LLE)
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部