期刊文献+
共找到10篇文章
< 1 >
每页显示 20 50 100
面向流数据分类的在线学习综述 被引量:25
1
作者 翟婷婷 高阳 朱俊武 《软件学报》 EI CSCD 北大核心 2020年第4期912-931,共20页
流数据分类旨在从连续不断到达的流式数据中增量学习一个从输入变量到类标变量的映射函数,以便对随时到达的测试数据进行准确分类.在线学习范式作为一种增量式的机器学习技术,是流数据分类的有效工具.主要从在线学习的角度对流数据分类... 流数据分类旨在从连续不断到达的流式数据中增量学习一个从输入变量到类标变量的映射函数,以便对随时到达的测试数据进行准确分类.在线学习范式作为一种增量式的机器学习技术,是流数据分类的有效工具.主要从在线学习的角度对流数据分类算法的研究现状进行综述.具体地,首先介绍在线学习的基本框架和性能评估方法,然后着重介绍在线学习算法在一般流数据上的工作现状,在高维流数据上解决“维度诅咒”问题的工作现状,以及在演化流数据上处理“概念漂移”问题的工作现状,最后讨论高维和演化流数据分类未来仍然存在的挑战和亟待研究的方向. 展开更多
关键词 在线学习 流数据分类 维度诅咒 概念漂移 稀疏在线学习 演化分类
下载PDF
基于隐马尔可夫模型的流数据分类算法
2
作者 潘怡 何可可 李国徽 《华中科技大学学报(自然科学版)》 EI CAS CSCD 北大核心 2014年第8期18-21,共4页
为优化周期性概念漂移分类精度,提出了一种基于隐马尔可夫模型的周期性流式数据分类(HMMSDC)算法,算法结合实际可观测序列的输出建立漂移概念状态序列的转移矩阵概率模型,由观测值概率分布密度来预测状态的转移序列.当预测误差超过用户... 为优化周期性概念漂移分类精度,提出了一种基于隐马尔可夫模型的周期性流式数据分类(HMMSDC)算法,算法结合实际可观测序列的输出建立漂移概念状态序列的转移矩阵概率模型,由观测值概率分布密度来预测状态的转移序列.当预测误差超过用户定义阈值时,算法能够更新优化转移矩阵参数,无须重复学习历史概念即可实现对数据概念漂移的有效预测.此外,算法采用半监督K-Mean学习方法训练样本集,降低了人工标记样例的代价,能够避免隐形马尔可夫模型因标记样例不足而产生的欠学习问题.实验结果表明:相对传统集成分类算法,新算法对周期性数据漂移具有更好的分类精确度及分类时效性. 展开更多
关键词 数据挖掘 流数据分类 概念漂移 隐形马尔可夫模型 半监督学习
原文传递
基于Binary-SADT的可疑金融交易识别方法 被引量:1
3
作者 张成虎 吴莹莹 《上海金融》 CSSCI 北大核心 2012年第5期107-111,119,共5页
针对目前使用静态数据挖掘技术识别可疑金融交易所面临的监测时效性低、数据覆盖面不全的问题,通过分析可疑金融交易的特征,本文提出了基于流数据分类挖掘的可疑金融交易识别算法,即Binary-SADT算法。SADT算法能够动态解决数据流挖掘中... 针对目前使用静态数据挖掘技术识别可疑金融交易所面临的监测时效性低、数据覆盖面不全的问题,通过分析可疑金融交易的特征,本文提出了基于流数据分类挖掘的可疑金融交易识别算法,即Binary-SADT算法。SADT算法能够动态解决数据流挖掘中的概念漂移,Binary-SADT在SADT的基础上利用二叉排序树处理金融交易数据流中的连续属性,构建并及时更新识别可疑金融交易的分类模型。理论分析和实验结果表明该算法所构建的分类模型符合业内专家总结的可疑金融交易特征,验证了该算法的可行性和有效性。 展开更多
关键词 流数据分类 可疑金融交易 Binary-SADT算法 滑动窗口
下载PDF
Clustering algorithm for multiple data streams based on spectral component similarity 被引量:1
4
作者 邹凌君 陈崚 屠莉 《Journal of Southeast University(English Edition)》 EI CAS 2008年第3期264-266,共3页
A new algorithm for clustering multiple data streams is proposed.The algorithm can effectively cluster data streams which show similar behavior with some unknown time delays.The algorithm uses the autoregressive (AR... A new algorithm for clustering multiple data streams is proposed.The algorithm can effectively cluster data streams which show similar behavior with some unknown time delays.The algorithm uses the autoregressive (AR) modeling technique to measure correlations between data streams.It exploits estimated frequencies spectra to extract the essential features of streams.Each stream is represented as the sum of spectral components and the correlation is measured component-wise.Each spectral component is described by four parameters,namely,amplitude,phase,damping rate and frequency.The ε-lag-correlation between two spectral components is calculated.The algorithm uses such information as similarity measures in clustering data streams.Based on a sliding window model,the algorithm can continuously report the most recent clustering results and adjust the number of clusters.Experiments on real and synthetic streams show that the proposed clustering method has a higher speed and clustering quality than other similar methods. 展开更多
关键词 data streams CLUSTERING AR model spectral component
下载PDF
An Experimental Analysis of Water and Soil Conservation Effected by Micro-landscape Structure
5
作者 汪洋 郑威 《Agricultural Science & Technology》 CAS 2012年第11期2442-2444,2452,共4页
[Objective] This comparative experiment was to explore the soil loss con- trol effects under cultivation combination of different soil and vegetation types, and to provide scientific basis for the upcoming pilot proje... [Objective] This comparative experiment was to explore the soil loss con- trol effects under cultivation combination of different soil and vegetation types, and to provide scientific basis for the upcoming pilot project of ecological recovery. [Method] Both the rudiment of water locomotion functioned by micro-landscape structures and different spatial combinations of various landscape constituents are considered, thus, the combination of multi-soil type, crop species and site conditions is designed in three different experimental sites. [Result] Soil loss estimates in experiments in South Wello significantly depended on various soil type, slope, vegetation and type of con- servation structure; grass cover tremendously reduces soil loss; legume cultivation performed better than cereal cultivation in soil loss control. [Conclusion] By conduct- ing the data analysis of the experiment, a scientific reference is proposed to the agri- culture planting and protective mode for the alleviation of water and soil loss in Amhara Region, Ethiopia. 展开更多
关键词 Landscape structure: Runoff Water and soil conservation Site condition EXPERIMENT
下载PDF
Data Flow&Transaction Mode Classification and An Explorative Estimation on Data Storage&Transaction Volume 被引量:3
6
作者 Cai Yuezhou Liu Yuexin 《China Economist》 2022年第6期78-112,共35页
The public has shown great interest in the data factor and data transactions,but the current attention is overly focused on personal behavioral data and transactions happening at Data Exchanges.To deliver a complete p... The public has shown great interest in the data factor and data transactions,but the current attention is overly focused on personal behavioral data and transactions happening at Data Exchanges.To deliver a complete picture of data flaw and transaction,this paper presents a systematic overview of the flow and transaction of personal,corporate and public data on the basis of data factor classification from various perspectives.By utilizing various sources of information,this paper estimates the volume of data generation&storage and the volume&trend of data market transactions for major economies in the world with the following findings:(i)Data classification is diverse due to a broad variety of applying scenarios,and data transaction and profit distribution are complex due to heterogenous entities,ownerships,information density and other attributes of different data types.(ii)Global data transaction has presented with the characteristics of productization,servitization and platform-based mode.(iii)For major economies,there is a commonly observed disequilibrium between data generation scale and storage scale,which is particularly striking for China.(i^v)The global data market is in a nascent stage of rapid development with a transaction volume of about 100 billion US dollars,and China s data market is even more underdeveloped and only accounts for some 10%of the world total.All sectors of the society should be flly aware of the diversity and complexity of data factor classification and data transactions,as well as the arduous and long-term nature of developing and improving relevant institutional systems.Adapting to such features,efforts should be made to improve data classification,enhance computing infrastructure development,foster professional data transaction and development institutions,and perfect the data governance system. 展开更多
关键词 Data factor data classification data transaction mode data generation&storage volume data transaction volume
下载PDF
THRFuzzy:Tangential holoentropy-enabled rough fuzzy classifier to classification of evolving data streams 被引量:1
7
作者 Jagannath E.Nalavade T.Senthil Murugan 《Journal of Central South University》 SCIE EI CAS CSCD 2017年第8期1789-1800,共12页
The rapid developments in the fields of telecommunication, sensor data, financial applications, analyzing of data streams, and so on, increase the rate of data arrival, among which the data mining technique is conside... The rapid developments in the fields of telecommunication, sensor data, financial applications, analyzing of data streams, and so on, increase the rate of data arrival, among which the data mining technique is considered a vital process. The data analysis process consists of different tasks, among which the data stream classification approaches face more challenges than the other commonly used techniques. Even though the classification is a continuous process, it requires a design that can adapt the classification model so as to adjust the concept change or the boundary change between the classes. Hence, we design a novel fuzzy classifier known as THRFuzzy to classify new incoming data streams. Rough set theory along with tangential holoentropy function helps in the designing the dynamic classification model. The classification approach uses kernel fuzzy c-means(FCM) clustering for the generation of the rules and tangential holoentropy function to update the membership function. The performance of the proposed THRFuzzy method is verified using three datasets, namely skin segmentation, localization, and breast cancer datasets, and the evaluated metrics, accuracy and time, comparing its performance with HRFuzzy and adaptive k-NN classifiers. The experimental results conclude that THRFuzzy classifier shows better classification results providing a maximum accuracy consuming a minimal time than the existing classifiers. 展开更多
关键词 data stream classification fuzzy rough set tangential holoentropy concept change
下载PDF
Logistic Regression for Evolving Data Streams Classification
8
作者 尹志武 黄上腾 薛贵荣 《Journal of Shanghai Jiaotong university(Science)》 EI 2007年第2期197-203,共7页
Logistic regression is a fast classifier and can achieve higher accuracy on small training data.Moreover,it can work on both discrete and continuous attributes with nonlinear patterns.Based on these properties of logi... Logistic regression is a fast classifier and can achieve higher accuracy on small training data.Moreover,it can work on both discrete and continuous attributes with nonlinear patterns.Based on these properties of logistic regression,this paper proposed an algorithm,called evolutionary logistical regression classifier(ELRClass),to solve the classification of evolving data streams.This algorithm applies logistic regression repeatedly to a sliding window of samples in order to update the existing classifier,to keep this classifier if its performance is deteriorated by the reason of bursting noise,or to construct a new classifier if a major concept drift is detected.The intensive experimental results demonstrate the effectiveness of this algorithm. 展开更多
关键词 CLASSIFICATION logistic regression data stream mining
下载PDF
Flow Label-Based TPv6 Packet Classification Algorithm with Dimension Reduction Capability
9
作者 黄小红 马严 《China Communications》 SCIE CSCD 2012年第5期1-9,共9页
Traditional packet classification for IPv4 involves examining standard 5-tuple of a packet header, source address, destination address, source port, destination port and protocol. With introduction of IPv6 flow label ... Traditional packet classification for IPv4 involves examining standard 5-tuple of a packet header, source address, destination address, source port, destination port and protocol. With introduction of IPv6 flow label field which entails labeling the packets belonging to the same flow, packet classification can be resolved based on 3 dimensions: flow label, source address and desti- nation address. In this paper, we propose a novel approach for the 3-tuple packet classification based on flow label. Besides, by introducing a conversion engine to covert the source-destination pairs to the compound address prefixes, we put forward an algorithm called Reducing Dimension (RD) with dimension reduction capability, which combines heuristic tree search with usage of buck- ets. And we also provide an improved version of RD, called Improved RD (IRD), which uses two mechanisms: path compression and priority tag, to optimize the perforrmnce. To evaluate our algo- rithm, extensive experiraents have been conducted using a number of synthetically generated databas- es. For the memory consumption, the two pro- posed new algorithms only consumes around 3% of the existing algorithms when the number of ill- ters increases to 10 k. And for the average search time, the search time of the two proposed algo- rithms is more than four times faster than others when the number of filters is 10 k. The results show that the proposed algorithm works well and outperforms rmny typical existing algorithms with the dimension reduction capability. 展开更多
关键词 IPV6 packet classification flow label
下载PDF
A TCAM-based Two-dimensional Prefix Packet Classification Algorithm
10
作者 王志恒 刘刚 白英彩 《Journal of Donghua University(English Edition)》 EI CAS 2004年第1期39-45,共7页
Packet classification (PC) has become the main method to support the quality of service and security of network application. And two-dimeusioual prefix packet classification (PPC) is the popular one. This paper analyz... Packet classification (PC) has become the main method to support the quality of service and security of network application. And two-dimeusioual prefix packet classification (PPC) is the popular one. This paper analyzes the problem of ruler conflict, and then presents a TCAM-based two-dimensional PPC algorithm. This algorithm makes use of the parallelism of TCAM to lookup the longest prefix in one instruction cycle. Then it uses a memory image and associated data structures to eliminate the conflicts between rulers, and performs a fast two-dimeusional PPC. Compared with other algorithms, this algorithm has the least time complexity and less space complexity. 展开更多
关键词 Ternary Content Addressable Memory (TCAM ) packet classification algorithm twodimensional prefix packet classification
下载PDF
上一页 1 下一页 到第
使用帮助 返回顶部