期刊文献+
共找到9,732篇文章
< 1 2 250 >
每页显示 20 50 100
A Robust Framework for Multimodal Sentiment Analysis with Noisy Labels Generated from Distributed Data Annotation
1
作者 Kai Jiang Bin Cao Jing Fan 《Computer Modeling in Engineering & Sciences》 SCIE EI 2024年第6期2965-2984,共20页
Multimodal sentiment analysis utilizes multimodal data such as text,facial expressions and voice to detect people’s attitudes.With the advent of distributed data collection and annotation,we can easily obtain and sha... Multimodal sentiment analysis utilizes multimodal data such as text,facial expressions and voice to detect people’s attitudes.With the advent of distributed data collection and annotation,we can easily obtain and share such multimodal data.However,due to professional discrepancies among annotators and lax quality control,noisy labels might be introduced.Recent research suggests that deep neural networks(DNNs)will overfit noisy labels,leading to the poor performance of the DNNs.To address this challenging problem,we present a Multimodal Robust Meta Learning framework(MRML)for multimodal sentiment analysis to resist noisy labels and correlate distinct modalities simultaneously.Specifically,we propose a two-layer fusion net to deeply fuse different modalities and improve the quality of the multimodal data features for label correction and network training.Besides,a multiple meta-learner(label corrector)strategy is proposed to enhance the label correction approach and prevent models from overfitting to noisy labels.We conducted experiments on three popular multimodal datasets to verify the superiority of ourmethod by comparing it with four baselines. 展开更多
关键词 distributed data collection multimodal sentiment analysis meta learning learn with noisy labels
下载PDF
GF-3 data real-time processing method based on multi-satellite distributed data processing system 被引量:5
2
作者 YANG Jun CAO Yan-dong +2 位作者 SUN Guang-cai XING Meng-dao GUO Liang 《Journal of Central South University》 SCIE EI CAS CSCD 2020年第3期842-852,共11页
Due to the limited scenes that synthetic aperture radar(SAR)satellites can detect,the full-track utilization rate is not high.Because of the computing and storage limitation of one satellite,it is difficult to process... Due to the limited scenes that synthetic aperture radar(SAR)satellites can detect,the full-track utilization rate is not high.Because of the computing and storage limitation of one satellite,it is difficult to process large amounts of data of spaceborne synthetic aperture radars.It is proposed to use a new method of networked satellite data processing for improving the efficiency of data processing.A multi-satellite distributed SAR real-time processing method based on Chirp Scaling(CS)imaging algorithm is studied in this paper,and a distributed data processing system is built with field programmable gate array(FPGA)chips as the kernel.Different from the traditional CS algorithm processing,the system divides data processing into three stages.The computing tasks are reasonably allocated to different data processing units(i.e.,satellites)in each stage.The method effectively saves computing and storage resources of satellites,improves the utilization rate of a single satellite,and shortens the data processing time.Gaofen-3(GF-3)satellite SAR raw data is processed by the system,with the performance of the method verified. 展开更多
关键词 synthetic aperture radar full-track utilization rate distributed data processing CS imaging algorithm field programmable gate array Gaofen-3
下载PDF
A Distributed Data Mining System Based on Multi-agent Technology 被引量:1
3
作者 郭黎明 张艳珍 《Journal of Donghua University(English Edition)》 EI CAS 2006年第6期80-83,共4页
Distributed Data Mining is expected to discover preciously unknown, implicit and valuable information from massive data set inherently distributed over a network. In recent years several approaches to distributed data... Distributed Data Mining is expected to discover preciously unknown, implicit and valuable information from massive data set inherently distributed over a network. In recent years several approaches to distributed data mining have been developed, but only a few of them make use of intelligent agents. This paper provides the reason for applying Multi-Agent Technology in Distributed Data Mining and presents a Distributed Data Mining System based on Multi-Agent Technology that deals with heterogeneity in such environment. Based on the advantages of both the CS model and agent-based model, the system is being able to address the specific concern of increasing scalability and enhancing performance. 展开更多
关键词 distributed data Mining MULTI-AGENT CORBA Client/Server.
下载PDF
Remote Control for the HT-7 Distributed Data Acquisition System
4
作者 岳冬利 罗家融 +1 位作者 王枫 朱琳 《Plasma Science and Technology》 SCIE EI CAS CSCD 2003年第4期1881-1886,共6页
HT-7 is the first superconducting tokamak device for fusion research in China. Many experiments have been done in the machine since 1994, and lots of satisfactory results have been achieved in the fusion research fiel... HT-7 is the first superconducting tokamak device for fusion research in China. Many experiments have been done in the machine since 1994, and lots of satisfactory results have been achieved in the fusion research field on HT-7 tokamak [1]. With the development of fusion research, remote control of experiment becomes more and more important to improve experimental efficiency and expand research results. This paper will describe a RCS (Remote Control System), the combined model of Browser/Server and Client/Server, based on Internet of HT-7 distributed data acquisition system (HT7DAS). By means of RCS, authorized users all over the world can control and configure HT7DAS remotely. The RCS is designed to improve the flexibility, opening, reliability and efficiency of HT7DAS. In the paper, the whole process of design along with implementation of the system and some key items are discussed in detail. The System has been successfully operated during HT-7 experiment in 2002 campaign period. 展开更多
关键词 TOKAMAK HT-7 distributed data acquisition
下载PDF
Refreshing File Aggregate of Distributed Data Warehouse in Sets of Electric Apparatus
5
作者 于宝琴 王太勇 +3 位作者 张君 周明 何改云 李国琴 《Transactions of Tianjin University》 EI CAS 2006年第3期174-179,共6页
Integrating heterogeneous data sources is a precondition to share data for enterprises. Highly-efficient data updating can both save system expenses, and offer real-time data. It is one of the hot issues to modify dat... Integrating heterogeneous data sources is a precondition to share data for enterprises. Highly-efficient data updating can both save system expenses, and offer real-time data. It is one of the hot issues to modify data rapidly in the pre-processing area of the data warehouse. An extract transform loading design is proposed based on a new data algorithm called Diff-Match,which is developed by utilizing mode matching and data-filtering technology. It can accelerate data renewal, filter the heterogeneous data, and seek out different sets of data. Its efficiency has been proved by its successful application in an enterprise of electric apparatus groups. 展开更多
关键词 distributed data warehouse Diff-Match algorithm KMP algorithm file aggregates extract transform loading
下载PDF
A New Approach for Knowledge Discovery in Distributed Databases Using Fragmented Data Storage Model
6
作者 Masoud Pesaran Behbahani Islam Choudhury Souheil Khaddaj 《Chinese Business Review》 2013年第12期834-845,共12页
Since the early 1990, significant progress in database technology has provided new platform for emerging new dimensions of data engineering. New models were introduced to utilize the data sets stored in the new genera... Since the early 1990, significant progress in database technology has provided new platform for emerging new dimensions of data engineering. New models were introduced to utilize the data sets stored in the new generations of databases. These models have a deep impact on evolving decision-support systems. But they suffer a variety of practical problems while accessing real-world data sources. Specifically a type of data storage model based on data distribution theory has been increasingly used in recent years by large-scale enterprises, while it is not compatible with existing decision-support models. This data storage model stores the data in different geographical sites where they are more regularly accessed. This leads to considerably less inter-site data transfer that can reduce data security issues in some circumstances and also significantly improve data manipulation transactions speed. The aim of this paper is to propose a new approach for supporting proactive decision-making that utilizes a workable data source management methodology. The new model can effectively organize and use complex data sources, even when they are distributed in different sites in a fragmented form. At the same time, the new model provides a very high level of intellectual management decision-support by intelligent use of the data collections through utilizing new smart methods in synthesizing useful knowledge. The results of an empirical study to evaluate the model are provided. 展开更多
关键词 data mining decision-support system distributed databases knowledge discovery in database (KDD)
下载PDF
Research and Application of Distributed Data Mining Method for Improving Rural Power Grid Enterprises in Production and Operation Status Evaluation
7
作者 Gao Xiu-yun Xiang Wen Fang Jun-long 《Journal of Northeast Agricultural University(English Edition)》 CAS 2019年第2期87-96,共10页
With the reform of rural network enterprise system,the speed of transfer property rights in rural power enterprises is accelerated.The evaluation of the operation and development status of rural power enterprises is d... With the reform of rural network enterprise system,the speed of transfer property rights in rural power enterprises is accelerated.The evaluation of the operation and development status of rural power enterprises is directly related to the future development and investment direction of rural power enterprises.At present,the evaluation of the production and operation of rural network enterprises and the development status of power network only relies on the experience of the evaluation personnel,sets the reference index,and forms the evaluation results through artificial scoring.Due to the strong subjective consciousness of the evaluation results,the practical guiding significance is weak.Therefore,distributed data mining method in rural power enterprises status evaluation was proposed which had been applied in many fields,such as food science,economy or chemical industry.The distributed mathematical model was established by using principal component analysis(PCA)and regression analysis.By screening various technical indicators and determining their relevance,the reference value of evaluation results was improved.Combined with statistical program for social sciences(SPSS)data analysis software,the operation status of rural network enterprises was evaluated,and the rationality,effectiveness and economy of the evaluation was verified through comparison with current evaluation results and calculation examples of actual grid operation data. 展开更多
关键词 RURAL power grid PRODUCTION and management distributed data mining STATISTICAL program for SOCIAL sciences(SPSS19)
下载PDF
Distributed Data Acquisition and Control by Software Bus 被引量:2
8
作者 Cecil Bruce-Boye Dmitry A.Kazakov 《自动化博览》 2004年第5期98-99,共2页
Increasing global competition forces manufacturers of products from alltechnical fields to guarantee a high product quality for a long period of time. At thesame time it is necessary to minimize production costs. In o... Increasing global competition forces manufacturers of products from alltechnical fields to guarantee a high product quality for a long period of time. At thesame time it is necessary to minimize production costs. In order to meet all theserequirements, on-line data acquisition and processing are of increasing importancein distributed automation systems. A software bus operating on industrial Ethernethas an ability to minimize operating costs by offering easy installation, scalability,high degree of reliability and remote monitoring and control. 展开更多
关键词 工业以太网 现场总线 CAN总线 OPC
下载PDF
The Design and Implementation of a Distributed Data Acquisition、Monitoring & Processing System (DDAMAP)
9
作者 Guoshun Zhou Hua Shen HuiQi Yan 《软件工程师》 2011年第2期123-127,共5页
This report presents the design and implementation of a Distributed Data Acquisition、 Monitoring and Processing System (DDAMAP)。It is assumed that operations of a factory are organized into two-levels: client machin... This report presents the design and implementation of a Distributed Data Acquisition、 Monitoring and Processing System (DDAMAP)。It is assumed that operations of a factory are organized into two-levels: client machines at plant-level collect real-time raw data from sensors and measurement instrumentations and transfer them to a central processor over the Ethernets, and the central processor handles tasks of real-time data processing and monitoring. This system utilizes the computation power of Intel T2300 dual-core processor and parallel computations supported by multi-threading techniques. Our experiments show that these techniques can significantly improve the system performance and are viable solutions to real-time high-speed data processing. 展开更多
关键词 软件 数据处理 传感器 仪表
下载PDF
Data complexity-based batch sanitization method against poison in distributed learning
10
作者 Silv Wang Kai Fan +2 位作者 Kuan Zhang Hui Li Yintang Yang 《Digital Communications and Networks》 SCIE CSCD 2024年第2期416-428,共13页
The security of Federated Learning(FL)/Distributed Machine Learning(DML)is gravely threatened by data poisoning attacks,which destroy the usability of the model by contaminating training samples,so such attacks are ca... The security of Federated Learning(FL)/Distributed Machine Learning(DML)is gravely threatened by data poisoning attacks,which destroy the usability of the model by contaminating training samples,so such attacks are called causative availability indiscriminate attacks.Facing the problem that existing data sanitization methods are hard to apply to real-time applications due to their tedious process and heavy computations,we propose a new supervised batch detection method for poison,which can fleetly sanitize the training dataset before the local model training.We design a training dataset generation method that helps to enhance accuracy and uses data complexity features to train a detection model,which will be used in an efficient batch hierarchical detection process.Our model stockpiles knowledge about poison,which can be expanded by retraining to adapt to new attacks.Being neither attack-specific nor scenario-specific,our method is applicable to FL/DML or other online or offline scenarios. 展开更多
关键词 distributed machine learning security Federated learning data poisoning attacks data sanitization Batch detection data complexity
下载PDF
Research on Tensor Multi-Clustering Distributed Incremental Updating Method for Big Data
11
作者 Hongjun Zhang Zeyu Zhang +3 位作者 Yilong Ruan Hao Ye Peng Li Desheng Shi 《Computers, Materials & Continua》 SCIE EI 2024年第10期1409-1432,共24页
The scale and complexity of big data are growing continuously,posing severe challenges to traditional data processing methods,especially in the field of clustering analysis.To address this issue,this paper introduces ... The scale and complexity of big data are growing continuously,posing severe challenges to traditional data processing methods,especially in the field of clustering analysis.To address this issue,this paper introduces a new method named Big Data Tensor Multi-Cluster Distributed Incremental Update(BDTMCDIncreUpdate),which combines distributed computing,storage technology,and incremental update techniques to provide an efficient and effective means for clustering analysis.Firstly,the original dataset is divided into multiple subblocks,and distributed computing resources are utilized to process the sub-blocks in parallel,enhancing efficiency.Then,initial clustering is performed on each sub-block using tensor-based multi-clustering techniques to obtain preliminary results.When new data arrives,incremental update technology is employed to update the core tensor and factor matrix,ensuring that the clustering model can adapt to changes in data.Finally,by combining the updated core tensor and factor matrix with historical computational results,refined clustering results are obtained,achieving real-time adaptation to dynamic data.Through experimental simulation on the Aminer dataset,the BDTMCDIncreUpdate method has demonstrated outstanding performance in terms of accuracy(ACC)and normalized mutual information(NMI)metrics,achieving an accuracy rate of 90%and an NMI score of 0.85,which outperforms existing methods such as TClusInitUpdate and TKLClusUpdate in most scenarios.Therefore,the BDTMCDIncreUpdate method offers an innovative solution to the field of big data analysis,integrating distributed computing,incremental updates,and tensor-based multi-clustering techniques.It not only improves the efficiency and scalability in processing large-scale high-dimensional datasets but also has been validated for its effectiveness and accuracy through experiments.This method shows great potential in real-world applications where dynamic data growth is common,and it is of significant importance for advancing the development of data analysis technology. 展开更多
关键词 TENSOR incremental update distributed clustering processing big data
下载PDF
Big Data Application Simulation Platform Design for Onboard Distributed Processing of LEO Mega-Constellation Networks
12
作者 Zhang Zhikai Gu Shushi +1 位作者 Zhang Qinyu Xue Jiayin 《China Communications》 SCIE CSCD 2024年第7期334-345,共12页
Due to the restricted satellite payloads in LEO mega-constellation networks(LMCNs),remote sensing image analysis,online learning and other big data services desirably need onboard distributed processing(OBDP).In exist... Due to the restricted satellite payloads in LEO mega-constellation networks(LMCNs),remote sensing image analysis,online learning and other big data services desirably need onboard distributed processing(OBDP).In existing technologies,the efficiency of big data applications(BDAs)in distributed systems hinges on the stable-state and low-latency links between worker nodes.However,LMCNs with high-dynamic nodes and long-distance links can not provide the above conditions,which makes the performance of OBDP hard to be intuitively measured.To bridge this gap,a multidimensional simulation platform is indispensable that can simulate the network environment of LMCNs and put BDAs in it for performance testing.Using STK's APIs and parallel computing framework,we achieve real-time simulation for thousands of satellite nodes,which are mapped as application nodes through software defined network(SDN)and container technologies.We elaborate the architecture and mechanism of the simulation platform,and take the Starlink and Hadoop as realistic examples for simulations.The results indicate that LMCNs have dynamic end-to-end latency which fluctuates periodically with the constellation movement.Compared to ground data center networks(GDCNs),LMCNs deteriorate the computing and storage job throughput,which can be alleviated by the utilization of erasure codes and data flow scheduling of worker nodes. 展开更多
关键词 big data application Hadoop LEO mega-constellation multidimensional simulation onboard distributed processing
下载PDF
An Adaptive Privacy Preserving Framework for Distributed Association Rule Mining in Healthcare Databases
13
作者 Hasanien K.Kuba Mustafa A.Azzawi +2 位作者 Saad M.Darwish Oday A.Hassen Ansam A.Abdulhussein 《Computers, Materials & Continua》 SCIE EI 2023年第2期4119-4133,共15页
It is crucial,while using healthcare data,to assess the advantages of data privacy against the possible drawbacks.Data from several sources must be combined for use in many data mining applications.The medical practit... It is crucial,while using healthcare data,to assess the advantages of data privacy against the possible drawbacks.Data from several sources must be combined for use in many data mining applications.The medical practitioner may use the results of association rule mining performed on this aggregated data to better personalize patient care and implement preventive measures.Historically,numerous heuristics(e.g.,greedy search)and metaheuristics-based techniques(e.g.,evolutionary algorithm)have been created for the positive association rule in privacy preserving data mining(PPDM).When it comes to connecting seemingly unrelated diseases and drugs,negative association rules may be more informative than their positive counterparts.It is well-known that during negative association rules mining,a large number of uninteresting rules are formed,making this a difficult problem to tackle.In this research,we offer an adaptive method for negative association rule mining in vertically partitioned healthcare datasets that respects users’privacy.The applied approach dynamically determines the transactions to be interrupted for information hiding,as opposed to predefining them.This study introduces a novel method for addressing the problem of negative association rules in healthcare data mining,one that is based on the Tabu-genetic optimization paradigm.Tabu search is advantageous since it removes a huge number of unnecessary rules and item sets.Experiments using benchmark healthcare datasets prove that the discussed scheme outperforms state-of-the-art solutions in terms of decreasing side effects and data distortions,as measured by the indicator of hiding failure. 展开更多
关键词 distributed data mining evolutionary computation sanitization process healthcare informatics
下载PDF
Design and Implementation of Key-Value Database for Ship Virtual Test Platform Based on Distributed System
14
作者 Kejia Zhang Qingyu Meng +2 位作者 Haiwei Pan Maocai Yuan Baoying Ma 《国际计算机前沿大会会议论文集》 EI 2023年第1期109-123,共15页
The virtual test platform is a vital tool for ship simulation and testing.However,the numerical pool ship virtual test platform is a complex system that comprises multiple heterogeneous data types,such as relational d... The virtual test platform is a vital tool for ship simulation and testing.However,the numerical pool ship virtual test platform is a complex system that comprises multiple heterogeneous data types,such as relational data,files,text,images,and animations.The analysis,evaluation,and decision-making processes heavily depend on data,which continue to increase in size and complexity.As a result,there is an increasing need for a distributed database system to manage these data.In this paper,we propose a Key-Value database based on a distributed system that can operate on any type of data,regardless of its size or type.This database architecture supports class column storage and load balancing and optimizes the efficiency of I/O bandwidth and CPU resource utilization.Moreover,it is specif-ically designed to handle the storage and access of largefiles.Additionally,we propose a multimodal data fusion mechanism that can connect various descrip-tions of the same substance,enabling the fusion and retrieval of heterogeneous multimodal data to facilitate data analysis.Our approach focuses on indexing and storage,and we compare our solution with Redis,MongoDB,and MySQL through experiments.We demonstrate the performance,scalability,and reliability of our proposed database system while also analysing its architecture’s defects and providing optimization solutions and future research directions.In conclu-sion,our database system provides an efficient and reliable solution for the data management of the virtual test platform of numerical pool ships. 展开更多
关键词 Key-Value databases multimodal data fusion heterogeneous data distributed systems columnar-like storage INDEXING
原文传递
Finer topographic data improves distribution modeling of Picea crassifolia in the northern Qilian Mountains
15
作者 ZHANG Xiang GAO Linlin +3 位作者 LUO Yu YUAN Yiyun MA Baolong DENG Yang 《Journal of Mountain Science》 SCIE CSCD 2024年第10期3306-3317,共12页
The Qilian Mountains, a national key ecological function zone in Western China, play a pivotal role in ecosystem services. However, the distribution of its dominant tree species, Picea crassifolia (Qinghai spruce), ha... The Qilian Mountains, a national key ecological function zone in Western China, play a pivotal role in ecosystem services. However, the distribution of its dominant tree species, Picea crassifolia (Qinghai spruce), has decreased dramatically in the past decades due to climate change and human activity, which may have influenced its ecological functions. To restore its ecological functions, reasonable reforestation is the key measure. Many previous efforts have predicted the potential distribution of Picea crassifolia, which provides guidance on regional reforestation policy. However, all of them were performed at low spatial resolution, thus ignoring the natural characteristics of the patchy distribution of Picea crassifolia. Here, we modeled the distribution of Picea crassifolia with species distribution models at high spatial resolutions. For many models, the area under the receiver operating characteristic curve (AUC) is larger than 0.9, suggesting their excellent precision. The AUC of models at 30 m is higher than that of models at 90 m, and the current potential distribution of Picea crassifolia is more closely aligned with its actual distribution at 30 m, demonstrating that finer data resolution improves model performance. Besides, for models at 90 m resolution, annual precipitation (Bio12) played the paramount influence on the distribution of Picea crassifolia, while the aspect became the most important one at 30 m, indicating the crucial role of finer topographic data in modeling species with patchy distribution. The current distribution of Picea crassifolia was concentrated in the northern and central parts of the study area, and this pattern will be maintained under future scenarios, although some habitat loss in the central parts and gain in the eastern regions is expected owing to increasing temperatures and precipitation. Our findings can guide protective and restoration strategies for the Qilian Mountains, which would benefit regional ecological balance. 展开更多
关键词 Species distribution modeling Picea crassifolia High resolution topographic data Climate change Qilian Mountains Nature Reserve Climate scenarios
下载PDF
Content-Related Repairing of Inconsistencies in Distributed Data
16
作者 Yue-Feng Du De-Rong Shen +2 位作者 Tie-Zheng Nie Yue Kou Ge Yu 《Journal of Computer Science & Technology》 SCIE EI CSCD 2016年第4期741-758,共18页
Conditional functional dependencies (CFDs) are a critical technique for detecting inconsistencies while they may ignore some potential inconsistencies without considering the content relationship of data. Content-re... Conditional functional dependencies (CFDs) are a critical technique for detecting inconsistencies while they may ignore some potential inconsistencies without considering the content relationship of data. Content-related conditional functional dependencies (CCFDs) are a type of special CFDs, which combine content-related CFDs and detect potential inconsistencies by putting content-related data together. In the process of cleaning inconsistencies, detection and repairing are interactive: 1) detection catches inconsistencies, 2) repairing corrects caught inconsistencies while may bring new incon- sistencies. Besides, data are often fragmented and distributed into multiple sites. It consequently costs expensive shipment for inconsistencies cleaning. In this paper, our aim is to repair inconsistencies in distributed content-related data. We propose a framework consisting of an inconsistencies detection method and an inconsistencies repairing method, which work iteratively. The detection method marks the violated CCFDs for computing the inconsistencies which should be repaired preferentially. Based on the repairing-cost model presented in this paper, we prove that the minimum-cost repairing using CCFDs is NP-complete. Therefore, the repairing method heuristically repairs the inconsistencies with minimum cost. To improve the efficiency and accuracy of repairing, we propose distinct values and rules sequences. Distinct values make less data shipments than real data for communication. Rules sequences determine appropriate repairing sequences to avoid some incorrect repairs. Our solution is proved to be more effective than CFDs by empirical evaluation on two real-life datasets. 展开更多
关键词 data quality management distributed consistency content relativity consistency repairing
原文传递
Distributed Computation Models for Data Fusion System Simulation
17
作者 张岩 曾涛 +1 位作者 龙腾 崔智社 《Journal of Beijing Institute of Technology》 EI CAS 2001年第3期291-297,共7页
An attempt has been made to develop a distributed software infrastructure model for onboard data fusion system simulation, which is also applied to netted radar systems, onboard distributed detection systems and advan... An attempt has been made to develop a distributed software infrastructure model for onboard data fusion system simulation, which is also applied to netted radar systems, onboard distributed detection systems and advanced C3I systems. Two architectures are provided and verified: one is based on pure TCP/IP protocol and C/S model, and implemented with Winsock, the other is based on CORBA (common object request broker architecture). The performance of data fusion simulation system, i.e. reliability, flexibility and scalability, is improved and enhanced by two models. The study of them makes valuable explore on incorporating the distributed computation concepts into radar system simulation techniques. 展开更多
关键词 radar system computer network data fusion SIMULATION distributed computation
下载PDF
Wide Area Analytics for Geographically Distributed Datacenters 被引量:1
18
作者 Siqi Ji Baochun Li 《Tsinghua Science and Technology》 SCIE EI CAS CSCD 2016年第2期125-135,共11页
Big data analytics, the process of organizing and analyzing data to get useful information, is one of the primary uses of cloud services today. Traditionally, collections of data are stored and processed in a single d... Big data analytics, the process of organizing and analyzing data to get useful information, is one of the primary uses of cloud services today. Traditionally, collections of data are stored and processed in a single datacenter. As the volume of data grows at a tremendous rate, it is less efficient for only one datacenter to handle such large volumes of data from a performance point of view. Large cloud service providers are deploying datacenters geographically around the world for better performance and availability. A widely used approach for analytics of gee-distributed data is the centralized approach, which aggregates all the raw data from local datacenters to a central datacenter. However, it has been observed that this approach consumes a significant amount of bandwidth, leading to worse performance. A number of mechanisms have been proposed to achieve optimal performance when data analytics are performed over geo-distributed datacenters. In this paper, we present a survey on the representative mechanisms proposed in the literature for wide area analytics. We discuss basic ideas, present proposed architectures and mechanisms, and discuss several examples to illustrate existing work. We point out the limitations of these mechanisms, give comparisons, and conclude with our thoughts on future research directions. 展开更多
关键词 big data ANALYTICS geo-distributed datacenters
原文传递
MA-IDS: A Distributed Intrusion Detection System Based on Data Mining
19
作者 SUNJian-hua JINHai CHENHao HANZong-fen 《Wuhan University Journal of Natural Sciences》 CAS 2005年第1期111-114,共4页
Aiming at the shortcomings in intrusion detection systems (IDSs) used incommercial and research fields, we propose the MA-IDS system, a distributed intrusion detectionsystem based on data mining. In this model, misuse... Aiming at the shortcomings in intrusion detection systems (IDSs) used incommercial and research fields, we propose the MA-IDS system, a distributed intrusion detectionsystem based on data mining. In this model, misuse intrusion detection system CM1DS) and anomalyintrusion de-lection system (AIDS) are combined. Data mining is applied to raise detectionperformance, and distributed mechanism is employed to increase the scalability and efficiency. Host-and network-based mining algorithms employ an improved. Bayes-ian decision theorem that suits forreal security environment to minimize the risks incurred by false decisions. We describe the overallarchitecture of the MA-IDS system, and discuss specific design and implementation issue. 展开更多
关键词 intrusion detection data mining distributed system
下载PDF
A Distributed Covert Channel of the Packet Ordering Enhancement Model Based on Data Compression
20
作者 Lejun Zhang Xiaoyan Hu +5 位作者 Zhijie Zhang Weizheng Wang Tianwen Huang Donghai Guan Chunhui Zhao Seokhoon Kim 《Computers, Materials & Continua》 SCIE EI 2020年第9期2013-2030,共18页
Covert channel of the packet ordering is a hot research topic.Encryption technology is not enough to protect the security of both sides of communication.Covert channel needs to hide the transmission data and protect c... Covert channel of the packet ordering is a hot research topic.Encryption technology is not enough to protect the security of both sides of communication.Covert channel needs to hide the transmission data and protect content of communication.The traditional methods are usually to use proxy technology such as tor anonymous tracking technology to achieve hiding from the communicator.However,because the establishment of proxy communication needs to consume traffic,the communication capacity will be reduced,and in recent years,the tor technology often has vulnerabilities that led to the leakage of secret information.In this paper,the covert channel model of the packet ordering is applied into the distributed system,and a distributed covert channel of the packet ordering enhancement model based on data compression(DCCPOEDC)is proposed.The data compression algorithms are used to reduce the amount of data and transmission time.The distributed system and data compression algorithms can weaken the hidden statistical probability of information.Furthermore,they can enhance the unknowability of the data and weaken the time distribution characteristics of the data packets.This paper selected a compression algorithm suitable for DCCPOEDC and analyzed DCCPOEDC from anonymity,transmission efficiency,and transmission performance.According to the analysis results,it can be seen that DCCPOEDC optimizes the covert channel of the packet ordering,which saves the transmission time and improves the concealment compared with the original covert channel. 展开更多
关键词 Covert channels information hiding data compression distributed system
下载PDF
上一页 1 2 250 下一页 到第
使用帮助 返回顶部