Continuous response of range query on steaming data provides useful information for many practical applications as well as the risk of privacy disclosure.The existing research on differential privacy streaming data pu...Continuous response of range query on steaming data provides useful information for many practical applications as well as the risk of privacy disclosure.The existing research on differential privacy streaming data publication mostly pay close attention to boosting query accuracy,but pay less attention to query efficiency,and ignore the effect of timeliness on data weight.In this paper,we propose an effective algorithm of differential privacy streaming data publication under exponential decay mode.Firstly,by introducing the Fenwick tree to divide and reorganize data items in the stream,we achieve a constant time complexity for inserting a new item and getting the prefix sum.Meanwhile,we achieve time complicity linear to the number of data item for building a tree.After that,we use the advantage of matrix mechanism to deal with relevant queries and reduce the global sensitivity.In addition,we choose proper diagonal matrix further improve the range query accuracy.Finally,considering about exponential decay,every data item is weighted by the decay factor.By putting the Fenwick tree and matrix optimization together,we present complete algorithm for differentiate private real-time streaming data publication.The experiment is designed to compare the algorithm in this paper with similar algorithms for streaming data release in exponential decay.Experimental results show that the algorithm in this paper effectively improve the query efficiency while ensuring the quality of the query.展开更多
Recently,anomaly detection(AD)in streaming data gained significant attention among research communities due to its applicability in finance,business,healthcare,education,etc.The recent developments of deep learning(DL...Recently,anomaly detection(AD)in streaming data gained significant attention among research communities due to its applicability in finance,business,healthcare,education,etc.The recent developments of deep learning(DL)models find helpful in the detection and classification of anomalies.This article designs an oversampling with an optimal deep learning-based streaming data classification(OS-ODLSDC)model.The aim of the OSODLSDC model is to recognize and classify the presence of anomalies in the streaming data.The proposed OS-ODLSDC model initially undergoes preprocessing step.Since streaming data is unbalanced,support vector machine(SVM)-Synthetic Minority Over-sampling Technique(SVM-SMOTE)is applied for oversampling process.Besides,the OS-ODLSDC model employs bidirectional long short-term memory(Bi LSTM)for AD and classification.Finally,the root means square propagation(RMSProp)optimizer is applied for optimal hyperparameter tuning of the Bi LSTM model.For ensuring the promising performance of the OS-ODLSDC model,a wide-ranging experimental analysis is performed using three benchmark datasets such as CICIDS 2018,KDD-Cup 1999,and NSL-KDD datasets.展开更多
In the era of big data, huge volumes of data are generated from online social networks, sensor networks, mobile devices, and organizations’ enterprise systems. This phenomenon provides organizations with unprecedente...In the era of big data, huge volumes of data are generated from online social networks, sensor networks, mobile devices, and organizations’ enterprise systems. This phenomenon provides organizations with unprecedented opportunities to tap into big data to mine valuable business intelligence. However, traditional business analytics methods may not be able to cope with the flood of big data. The main contribution of this paper is the illustration of the development of a novel big data stream analytics framework named BDSASA that leverages a probabilistic language model to analyze the consumer sentiments embedded in hundreds of millions of online consumer reviews. In particular, an inference model is embedded into the classical language modeling framework to enhance the prediction of consumer sentiments. The practical implication of our research work is that organizations can apply our big data stream analytics framework to analyze consumers’ product preferences, and hence develop more effective marketing and production strategies.展开更多
Opinion (sentiment) analysis on big data streams from the constantly generated text streams on social media networks to hundreds of millions of online consumer reviews provides many organizations in every field with o...Opinion (sentiment) analysis on big data streams from the constantly generated text streams on social media networks to hundreds of millions of online consumer reviews provides many organizations in every field with opportunities to discover valuable intelligence from the massive user generated text streams. However, the traditional content analysis frameworks are inefficient to handle the unprecedentedly big volume of unstructured text streams and the complexity of text analysis tasks for the real time opinion analysis on the big data streams. In this paper, we propose a parallel real time sentiment analysis system: Social Media Data Stream Sentiment Analysis Service (SMDSSAS) that performs multiple phases of sentiment analysis of social media text streams effectively in real time with two fully analytic opinion mining models to combat the scale of text data streams and the complexity of sentiment analysis processing on unstructured text streams. We propose two aspect based opinion mining models: Deterministic and Probabilistic sentiment models for a real time sentiment analysis on the user given topic related data streams. Experiments on the social media Twitter stream traffic captured during the pre-election weeks of the 2016 Presidential election for real-time analysis of public opinions toward two presidential candidates showed that the proposed system was able to predict correctly Donald Trump as the winner of the 2016 Presidential election. The cross validation results showed that the proposed sentiment models with the real-time streaming components in our proposed framework delivered effectively the analysis of the opinions on two presidential candidates with average 81% accuracy for the Deterministic model and 80% for the Probabilistic model, which are 1% - 22% improvements from the results of the existing literature.展开更多
The interleaving/multiplexing technique was used to realize a 200?MHz real time data acquisition system. Two 100?MHz ADC modules worked parallelly and every ADC plays out data in ping pang fashion. The design improv...The interleaving/multiplexing technique was used to realize a 200?MHz real time data acquisition system. Two 100?MHz ADC modules worked parallelly and every ADC plays out data in ping pang fashion. The design improved the system conversion rata to 200?MHz and reduced the speed of data transporting and storing to 50?MHz. The high speed HDPLD and ECL logic parts were used to control system timing and the memory address. The multi layer print board and the shield were used to decrease interference produced by the high speed circuit. The system timing was designed carefully. The interleaving/multiplexing technique could improve the system conversion rata greatly while reducing the speed of external digital interfaces greatly. The design resolved the difficulties in high speed system effectively. The experiment proved the data acquisition system is stable and accurate.展开更多
In order to improve the precision of super point detection and control measurement resource consumption, this paper proposes a super point detection method based on sampling and data streaming algorithms (SDSD), and...In order to improve the precision of super point detection and control measurement resource consumption, this paper proposes a super point detection method based on sampling and data streaming algorithms (SDSD), and proves that only sources or destinations with a lot of flows can be sampled probabilistically using the SDSD algorithm. The SDSD algorithm uses both the IP table and the flow bloom filter (BF) data structures to maintain the IP and flow information. The IP table is used to judge whether an IP address has been recorded. If the IP exists, then all its subsequent flows will be recorded into the flow BF; otherwise, the IP flow is sampled. This paper also analyzes the accuracy and memory requirements of the SDSD algorithm , and tests them using the CERNET trace. The theoretical analysis and experimental tests demonstrate that the most relative errors of the super points estimated by the SDSD algorithm are less than 5%, whereas the results of other algorithms are about 10%. Because of the BF structure, the SDSD algorithm is also better than previous algorithms in terms of memory consumption.展开更多
Offshore waters provide resources for human beings,while on the other hand,threaten them because of marine disasters.Ocean stations are part of offshore observation networks,and the quality of their data is of great s...Offshore waters provide resources for human beings,while on the other hand,threaten them because of marine disasters.Ocean stations are part of offshore observation networks,and the quality of their data is of great significance for exploiting and protecting the ocean.We used hourly mean wave height,temperature,and pressure real-time observation data taken in the Xiaomaidao station(in Qingdao,China)from June 1,2017,to May 31,2018,to explore the data quality using eight quality control methods,and to discriminate the most effective method for Xiaomaidao station.After using the eight quality control methods,the percentages of the mean wave height,temperature,and pressure data that passed the tests were 89.6%,88.3%,and 98.6%,respectively.With the marine disaster(wave alarm report)data,the values failed in the test mainly due to the influence of aging observation equipment and missing data transmissions.The mean wave height is often affected by dynamic marine disasters,so the continuity test method is not effective.The correlation test with other related parameters would be more useful for the mean wave height.展开更多
In the era of Big Data, typical architecture of distributed real-time stream processing systems is the combination of Flume, Kafka, and Storm. As a kind of distributed message system, Kafka has the characteristics of ...In the era of Big Data, typical architecture of distributed real-time stream processing systems is the combination of Flume, Kafka, and Storm. As a kind of distributed message system, Kafka has the characteristics of horizontal scalability and high throughput, which is manly deployed in many areas in order to address the problem of speed mismatch between message producers and consumers. When using Kafka, we need to quickly receive data sent by producers. In addition, we need to send data to consumers quickly. Therefore, the performance of Kafka is of critical importance to the performance of the whole stream processing system. In this paper, we propose the improved design of real-time stream processing systems, and focus on improving the Kafka's data loading process.We use Kafka cat to transfer data from the source to Kafka topic directly, which can reduce the network transmission. We also utilize the memory file system to accelerate the process of data loading, which can address the bottleneck and performance problems caused by disk I/O. Extensive experiments are conducted to evaluate the performance, which show the superiority of our improved design.展开更多
The application and development of a wide-area measurement system(WAMS)has enabled many applications and led to several requirements based on dynamic measurement data.Such data are transmitted as big data information ...The application and development of a wide-area measurement system(WAMS)has enabled many applications and led to several requirements based on dynamic measurement data.Such data are transmitted as big data information flow.To ensure effective transmission of wide-frequency electrical information by the communication protocol of a WAMS,this study performs real-time traffic monitoring and analysis of the data network of a power information system,and establishes corresponding network optimization strategies to solve existing transmission problems.This study utilizes the traffic analysis results obtained using the current real-time dynamic monitoring system to design an optimization strategy,covering the optimization in three progressive levels:the underlying communication protocol,source data,and transmission process.Optimization of the system structure and scheduling optimization of data information are validated to be feasible and practical via tests.展开更多
The co-frequency vibration fault is one of the common faults in the operation of rotating equipment,and realizing the real-time diagnosis of the co-frequency vibration fault is of great significance for monitoring the...The co-frequency vibration fault is one of the common faults in the operation of rotating equipment,and realizing the real-time diagnosis of the co-frequency vibration fault is of great significance for monitoring the health state and carrying out vibration suppression of the equipment.In engineering scenarios,co-frequency vibration faults are highlighted by rotational frequency and are difficult to identify,and existing intelligent methods require more hardware conditions and are exclusively time-consuming.Therefore,Lightweight-convolutional neural networks(LW-CNN)algorithm is proposed in this paper to achieve real-time fault diagnosis.The critical parameters are discussed and verified by simulated and experimental signals for the sliding window data augmentation method.Based on LW-CNN and data augmentation,the real-time intelligent diagnosis of co-frequency is realized.Moreover,a real-time detection method of fault diagnosis algorithm is proposed for data acquisition to fault diagnosis.It is verified by experiments that the LW-CNN and sliding window methods are used with high accuracy and real-time performance.展开更多
Data stream clustering is integral to contemporary big data applications.However,addressing the ongoing influx of data streams efficiently and accurately remains a primary challenge in current research.This paper aims...Data stream clustering is integral to contemporary big data applications.However,addressing the ongoing influx of data streams efficiently and accurately remains a primary challenge in current research.This paper aims to elevate the efficiency and precision of data stream clustering,leveraging the TEDA(Typicality and Eccentricity Data Analysis)algorithm as a foundation,we introduce improvements by integrating a nearest neighbor search algorithm to enhance both the efficiency and accuracy of the algorithm.The original TEDA algorithm,grounded in the concept of“Typicality and Eccentricity Data Analytics”,represents an evolving and recursive method that requires no prior knowledge.While the algorithm autonomously creates and merges clusters as new data arrives,its efficiency is significantly hindered by the need to traverse all existing clusters upon the arrival of further data.This work presents the NS-TEDA(Neighbor Search Based Typicality and Eccentricity Data Analysis)algorithm by incorporating a KD-Tree(K-Dimensional Tree)algorithm integrated with the Scapegoat Tree.Upon arrival,this ensures that new data points interact solely with clusters in very close proximity.This significantly enhances algorithm efficiency while preventing a single data point from joining too many clusters and mitigating the merging of clusters with high overlap to some extent.We apply the NS-TEDA algorithm to several well-known datasets,comparing its performance with other data stream clustering algorithms and the original TEDA algorithm.The results demonstrate that the proposed algorithm achieves higher accuracy,and its runtime exhibits almost linear dependence on the volume of data,making it more suitable for large-scale data stream analysis research.展开更多
In petroleum engineering,real-time lithology identification is very important for reservoir evaluation,drilling decisions and petroleum geological exploration.A lithology identification method while drilling based on ...In petroleum engineering,real-time lithology identification is very important for reservoir evaluation,drilling decisions and petroleum geological exploration.A lithology identification method while drilling based on machine learning and mud logging data is studied in this paper.This method can effectively utilize downhole parameters collected in real-time during drilling,to identify lithology in real-time and provide a reference for optimization of drilling parameters.Given the imbalance of lithology samples,the synthetic minority over-sampling technique(SMOTE)and Tomek link were used to balance the sample number of five lithologies.Meanwhile,this paper introduces Tent map,random opposition-based learning and dynamic perceived probability to the original crow search algorithm(CSA),and establishes an improved crow search algorithm(ICSA).In this paper,ICSA is used to optimize the hyperparameter combination of random forest(RF),extremely random trees(ET),extreme gradient boosting(XGB),and light gradient boosting machine(LGBM)models.In addition,this study combines the recognition advantages of the four models.The accuracy of lithology identification by the weighted average probability model reaches 0.877.The study of this paper realizes high-precision real-time lithology identification method,which can provide lithology reference for the drilling process.展开更多
Predicting the mechanical behaviors of structure and perceiving the anomalies in advance are essential to ensuring the safe operation of infrastructures in the long run.In addition to the incomplete consideration of i...Predicting the mechanical behaviors of structure and perceiving the anomalies in advance are essential to ensuring the safe operation of infrastructures in the long run.In addition to the incomplete consideration of influencing factors,the prediction time scale of existing studies is rough.Therefore,this study focuses on the development of a real-time prediction model by coupling the spatio-temporal correlation with external load through autoencoder network(ATENet)based on structural health monitoring(SHM)data.An autoencoder mechanism is performed to acquire the high-level representation of raw monitoring data at different spatial positions,and the recurrent neural network is applied to understanding the temporal correlation from the time series.Then,the obtained temporal-spatial information is coupled with dynamic loads through a fully connected layer to predict structural performance in next 12 h.As a case study,the proposed model is formulated on the SHM data collected from a representative underwater shield tunnel.The robustness study is carried out to verify the reliability and the prediction capability of the proposed model.Finally,the ATENet model is compared with some typical models,and the results indicate that it has the best performance.ATENet model is of great value to predict the realtime evolution trend of tunnel structure.展开更多
With the continual growth of the variety and complexity of network crime means, the traditional packet feature matching cannot detect all kinds of intrusion behaviors completely. It is urgent to reassemble network str...With the continual growth of the variety and complexity of network crime means, the traditional packet feature matching cannot detect all kinds of intrusion behaviors completely. It is urgent to reassemble network stream to perform packet processing at a semantic level above the network layer. This paper presents an efficient TCP stream reassembly mechanism for real-time processing of high-speed network traffic. By analyzing the characteristics of network stream in high-speed network and TCP connection establishment process, several polices for designing the reassembly mechanism are built. Then, the reassembly implementation is elaborated in accordance with the policies. Finally, the reassembly mechanism is compared with the traditional reassembly mechanism by the network traffic captured in a typical gigabit gateway. Experiment results illustrate that the reassembly mechanism is efficient and can satisfy the real-time property requirement of traffic analysis system in high-speed network.展开更多
Considering the increasing use of information technology with established standards, such as TCP/IP and XML in modem industrial automation, we present a high cost performance solution with FPGA (field programmable ga...Considering the increasing use of information technology with established standards, such as TCP/IP and XML in modem industrial automation, we present a high cost performance solution with FPGA (field programmable gate array) implementation of a novel reliable real-time data transfer system based on EPA (Ethemet for plant automation) protocol and IEEE 1588 standard. This combination can provide more predictable and real-time communication between automation equipments and precise synchronization between devices. The designed EPA system has been verified on Xilinx Spartan3 XC3S1500 and it consumed 75% of the total slices. The experimental results show that the novel industrial control system achieves high synchronization precision and provides a 1.59-ps standard deviation between the master device and the slave ones. Such a real-time data transfer system is an excellent candidate for automation equipments which require precise synchronization based on Ethemet at a comparatively low price.展开更多
Big data streams started becoming ubiquitous in recent years,thanks to rapid generation of massive volumes of data by different applications.It is challenging to apply existing data mining tools and techniques directl...Big data streams started becoming ubiquitous in recent years,thanks to rapid generation of massive volumes of data by different applications.It is challenging to apply existing data mining tools and techniques directly in these big data streams.At the same time,streaming data from several applications results in two major problems such as class imbalance and concept drift.The current research paper presents a new Multi-Objective Metaheuristic Optimization-based Big Data Analytics with Concept Drift Detection(MOMBD-CDD)method on High-Dimensional Streaming Data.The presented MOMBD-CDD model has different operational stages such as pre-processing,CDD,and classification.MOMBD-CDD model overcomes class imbalance problem by Synthetic Minority Over-sampling Technique(SMOTE).In order to determine the oversampling rates and neighboring point values of SMOTE,Glowworm Swarm Optimization(GSO)algorithm is employed.Besides,Statistical Test of Equal Proportions(STEPD),a CDD technique is also utilized.Finally,Bidirectional Long Short-Term Memory(Bi-LSTM)model is applied for classification.In order to improve classification performance and to compute the optimum parameters for Bi-LSTM model,GSO-based hyperparameter tuning process is carried out.The performance of the presented model was evaluated using high dimensional benchmark streaming datasets namely intrusion detection(NSL KDDCup)dataset and ECUE spam dataset.An extensive experimental validation process confirmed the effective outcome of MOMBD-CDD model.The proposed model attained high accuracy of 97.45%and 94.23%on the applied KDDCup99 Dataset and ECUE Spam datasets respectively.展开更多
This paper designs and develops a framework on a distributed computing platform for massive multi-source spatial data using a column-oriented database(HBase).This platform consists of four layers including ETL(extract...This paper designs and develops a framework on a distributed computing platform for massive multi-source spatial data using a column-oriented database(HBase).This platform consists of four layers including ETL(extraction transformation loading) tier,data processing tier,data storage tier and data display tier,achieving long-term store,real-time analysis and inquiry for massive data.Finally,a real dataset cluster is simulated,which are made up of 39 nodes including 2 master nodes and 37 data nodes,and performing function tests of data importing module and real-time query module,and performance tests of HDFS's I/O,the MapReduce cluster,batch-loading and real-time query of massive data.The test results indicate that this platform achieves high performance in terms of response time and linear scalability.展开更多
Recently, use of mobile communicational devices in field data collection is increasing such as smart phones and cellular phones due to emergence of embedded Global Position System GPS and Wi-Fi Internet access. Accura...Recently, use of mobile communicational devices in field data collection is increasing such as smart phones and cellular phones due to emergence of embedded Global Position System GPS and Wi-Fi Internet access. Accurate timely and handy field data collection is required for disaster management and emergency quick responses. In this article, we introduce web-based GIS system to collect the field data by personal mobile phone through Post Office Protocol POP3 mail server. The main objective of this work is to demonstrate real-time field data collection method to the students using their mobile phone to collect field data by timely and handy manners, either individual or group survey in local or global scale research.展开更多
A DMVOCC-MVDA (distributed multiversion optimistic concurrency control with multiversion dynamic adjustment) protocol was presented to process mobile distributed real-time transaction in mobile broadcast environment...A DMVOCC-MVDA (distributed multiversion optimistic concurrency control with multiversion dynamic adjustment) protocol was presented to process mobile distributed real-time transaction in mobile broadcast environments. At the mobile hosts, all transactions perform local pre-validation. The local pre-validation process is carried out against the committed transactions at the server in the last broadcast cycle. Transactions that survive in local pre-validation must be submitted to the server for local final validation. The new protocol eliminates conflicts between mobile read-only and mobile update transactions, and resolves data conflicts flexibly by using multiversion dynamic adjustment of serialization order to avoid unnecessary restarts of transactions. Mobile read-only transactions can be committed with no-blocking, and respond time of mobile read-only transactions is greatly shortened. The tolerance of mobile transactions of disconnections from the broadcast channel is increased. In global validation mobile distributed transactions have to do check to ensure distributed serializability in all participants. The simulation results show that the new concurrency control protocol proposed offers better performance than other protocols in terms of miss rate, restart rate, commit rate. Under high work load (think time is ls) the miss rate of DMVOCC-MVDA is only 14.6%, is significantly lower than that of other protocols. The restart rate of DMVOCC-MVDA is only 32.3%, showing that DMVOCC-MVDA can effectively reduce the restart rate of mobile transactions. And the commit rate of DMVOCC-MVDA is up to 61.2%, which is obviously higher than that of other protocols.展开更多
This paper focuses on the time efficiency for machine vision and intelligent photogrammetry, especially high accuracy on-board real-time cloud detection method. With the development of technology, the data acquisition...This paper focuses on the time efficiency for machine vision and intelligent photogrammetry, especially high accuracy on-board real-time cloud detection method. With the development of technology, the data acquisition ability is growing continuously and the volume of raw data is increasing explosively. Meanwhile, because of the higher requirement of data accuracy, the computation load is also becoming heavier. This situation makes time efficiency extremely important. Moreover, the cloud cover rate of optical satellite imagery is up to approximately 50%, which is seriously restricting the applications of on-board intelligent photogrammetry services. To meet the on-board cloud detection requirements and offer valid input data to subsequent processing, this paper presents a stream-computing of high accuracy on-board real-time cloud detection solution which follows the “bottom-up” understanding strategy of machine vision and uses multiple embedded GPU with significant potential to be applied on-board. Without external memory, the data parallel pipeline system based on multiple processing modules of this solution could afford the “stream-in, processing, stream-out” real-time stream computing. In experiments, images of GF-2 satellite are used to validate the accuracy and performance of this approach, and the experimental results show that this solution could not only bring up cloud detection accuracy, but also match the on-board real-time processing requirements.展开更多
基金This work is supported,in part,by the National Natural Science Foundation of China under grant numbers 61300026in part,by the Natural Science Foundation of Fujian Province under grant numbers 2017J01754, 2018J01797.
文摘Continuous response of range query on steaming data provides useful information for many practical applications as well as the risk of privacy disclosure.The existing research on differential privacy streaming data publication mostly pay close attention to boosting query accuracy,but pay less attention to query efficiency,and ignore the effect of timeliness on data weight.In this paper,we propose an effective algorithm of differential privacy streaming data publication under exponential decay mode.Firstly,by introducing the Fenwick tree to divide and reorganize data items in the stream,we achieve a constant time complexity for inserting a new item and getting the prefix sum.Meanwhile,we achieve time complicity linear to the number of data item for building a tree.After that,we use the advantage of matrix mechanism to deal with relevant queries and reduce the global sensitivity.In addition,we choose proper diagonal matrix further improve the range query accuracy.Finally,considering about exponential decay,every data item is weighted by the decay factor.By putting the Fenwick tree and matrix optimization together,we present complete algorithm for differentiate private real-time streaming data publication.The experiment is designed to compare the algorithm in this paper with similar algorithms for streaming data release in exponential decay.Experimental results show that the algorithm in this paper effectively improve the query efficiency while ensuring the quality of the query.
文摘Recently,anomaly detection(AD)in streaming data gained significant attention among research communities due to its applicability in finance,business,healthcare,education,etc.The recent developments of deep learning(DL)models find helpful in the detection and classification of anomalies.This article designs an oversampling with an optimal deep learning-based streaming data classification(OS-ODLSDC)model.The aim of the OSODLSDC model is to recognize and classify the presence of anomalies in the streaming data.The proposed OS-ODLSDC model initially undergoes preprocessing step.Since streaming data is unbalanced,support vector machine(SVM)-Synthetic Minority Over-sampling Technique(SVM-SMOTE)is applied for oversampling process.Besides,the OS-ODLSDC model employs bidirectional long short-term memory(Bi LSTM)for AD and classification.Finally,the root means square propagation(RMSProp)optimizer is applied for optimal hyperparameter tuning of the Bi LSTM model.For ensuring the promising performance of the OS-ODLSDC model,a wide-ranging experimental analysis is performed using three benchmark datasets such as CICIDS 2018,KDD-Cup 1999,and NSL-KDD datasets.
文摘In the era of big data, huge volumes of data are generated from online social networks, sensor networks, mobile devices, and organizations’ enterprise systems. This phenomenon provides organizations with unprecedented opportunities to tap into big data to mine valuable business intelligence. However, traditional business analytics methods may not be able to cope with the flood of big data. The main contribution of this paper is the illustration of the development of a novel big data stream analytics framework named BDSASA that leverages a probabilistic language model to analyze the consumer sentiments embedded in hundreds of millions of online consumer reviews. In particular, an inference model is embedded into the classical language modeling framework to enhance the prediction of consumer sentiments. The practical implication of our research work is that organizations can apply our big data stream analytics framework to analyze consumers’ product preferences, and hence develop more effective marketing and production strategies.
文摘Opinion (sentiment) analysis on big data streams from the constantly generated text streams on social media networks to hundreds of millions of online consumer reviews provides many organizations in every field with opportunities to discover valuable intelligence from the massive user generated text streams. However, the traditional content analysis frameworks are inefficient to handle the unprecedentedly big volume of unstructured text streams and the complexity of text analysis tasks for the real time opinion analysis on the big data streams. In this paper, we propose a parallel real time sentiment analysis system: Social Media Data Stream Sentiment Analysis Service (SMDSSAS) that performs multiple phases of sentiment analysis of social media text streams effectively in real time with two fully analytic opinion mining models to combat the scale of text data streams and the complexity of sentiment analysis processing on unstructured text streams. We propose two aspect based opinion mining models: Deterministic and Probabilistic sentiment models for a real time sentiment analysis on the user given topic related data streams. Experiments on the social media Twitter stream traffic captured during the pre-election weeks of the 2016 Presidential election for real-time analysis of public opinions toward two presidential candidates showed that the proposed system was able to predict correctly Donald Trump as the winner of the 2016 Presidential election. The cross validation results showed that the proposed sentiment models with the real-time streaming components in our proposed framework delivered effectively the analysis of the opinions on two presidential candidates with average 81% accuracy for the Deterministic model and 80% for the Probabilistic model, which are 1% - 22% improvements from the results of the existing literature.
文摘The interleaving/multiplexing technique was used to realize a 200?MHz real time data acquisition system. Two 100?MHz ADC modules worked parallelly and every ADC plays out data in ping pang fashion. The design improved the system conversion rata to 200?MHz and reduced the speed of data transporting and storing to 50?MHz. The high speed HDPLD and ECL logic parts were used to control system timing and the memory address. The multi layer print board and the shield were used to decrease interference produced by the high speed circuit. The system timing was designed carefully. The interleaving/multiplexing technique could improve the system conversion rata greatly while reducing the speed of external digital interfaces greatly. The design resolved the difficulties in high speed system effectively. The experiment proved the data acquisition system is stable and accurate.
基金The National Basic Research Program of China(973Program)(No.2009CB320505)the Natural Science Foundation of Jiangsu Province(No. BK2008288)+1 种基金the Excellent Young Teachers Program of Southeast University(No.4009001018)the Open Research Program of Key Laboratory of Computer Network of Guangdong Province (No. CCNL200706)
文摘In order to improve the precision of super point detection and control measurement resource consumption, this paper proposes a super point detection method based on sampling and data streaming algorithms (SDSD), and proves that only sources or destinations with a lot of flows can be sampled probabilistically using the SDSD algorithm. The SDSD algorithm uses both the IP table and the flow bloom filter (BF) data structures to maintain the IP and flow information. The IP table is used to judge whether an IP address has been recorded. If the IP exists, then all its subsequent flows will be recorded into the flow BF; otherwise, the IP flow is sampled. This paper also analyzes the accuracy and memory requirements of the SDSD algorithm , and tests them using the CERNET trace. The theoretical analysis and experimental tests demonstrate that the most relative errors of the super points estimated by the SDSD algorithm are less than 5%, whereas the results of other algorithms are about 10%. Because of the BF structure, the SDSD algorithm is also better than previous algorithms in terms of memory consumption.
基金Supported by the National Key Research and Development Program of China(Nos.2016YFC1402000,2018YFC1407003,2017YFC1405300)
文摘Offshore waters provide resources for human beings,while on the other hand,threaten them because of marine disasters.Ocean stations are part of offshore observation networks,and the quality of their data is of great significance for exploiting and protecting the ocean.We used hourly mean wave height,temperature,and pressure real-time observation data taken in the Xiaomaidao station(in Qingdao,China)from June 1,2017,to May 31,2018,to explore the data quality using eight quality control methods,and to discriminate the most effective method for Xiaomaidao station.After using the eight quality control methods,the percentages of the mean wave height,temperature,and pressure data that passed the tests were 89.6%,88.3%,and 98.6%,respectively.With the marine disaster(wave alarm report)data,the values failed in the test mainly due to the influence of aging observation equipment and missing data transmissions.The mean wave height is often affected by dynamic marine disasters,so the continuity test method is not effective.The correlation test with other related parameters would be more useful for the mean wave height.
基金supported by the Research Fund of National Key Laboratory of Computer Architecture under Grant No.CARCH201501the Open Project Program of the State Key Laboratory of Mathematical Engineering and Advanced Computing under Grant No.2016A09
文摘In the era of Big Data, typical architecture of distributed real-time stream processing systems is the combination of Flume, Kafka, and Storm. As a kind of distributed message system, Kafka has the characteristics of horizontal scalability and high throughput, which is manly deployed in many areas in order to address the problem of speed mismatch between message producers and consumers. When using Kafka, we need to quickly receive data sent by producers. In addition, we need to send data to consumers quickly. Therefore, the performance of Kafka is of critical importance to the performance of the whole stream processing system. In this paper, we propose the improved design of real-time stream processing systems, and focus on improving the Kafka's data loading process.We use Kafka cat to transfer data from the source to Kafka topic directly, which can reduce the network transmission. We also utilize the memory file system to accelerate the process of data loading, which can address the bottleneck and performance problems caused by disk I/O. Extensive experiments are conducted to evaluate the performance, which show the superiority of our improved design.
文摘The application and development of a wide-area measurement system(WAMS)has enabled many applications and led to several requirements based on dynamic measurement data.Such data are transmitted as big data information flow.To ensure effective transmission of wide-frequency electrical information by the communication protocol of a WAMS,this study performs real-time traffic monitoring and analysis of the data network of a power information system,and establishes corresponding network optimization strategies to solve existing transmission problems.This study utilizes the traffic analysis results obtained using the current real-time dynamic monitoring system to design an optimization strategy,covering the optimization in three progressive levels:the underlying communication protocol,source data,and transmission process.Optimization of the system structure and scheduling optimization of data information are validated to be feasible and practical via tests.
基金Supported by National Natural Science Foundation of China(Grant Nos.51875031,52242507)Beijing Municipal Natural Science Foundation of China(Grant No.3212010)Beijing Municipal Youth Backbone Personal Project of China(Grant No.2017000020124 G018).
文摘The co-frequency vibration fault is one of the common faults in the operation of rotating equipment,and realizing the real-time diagnosis of the co-frequency vibration fault is of great significance for monitoring the health state and carrying out vibration suppression of the equipment.In engineering scenarios,co-frequency vibration faults are highlighted by rotational frequency and are difficult to identify,and existing intelligent methods require more hardware conditions and are exclusively time-consuming.Therefore,Lightweight-convolutional neural networks(LW-CNN)algorithm is proposed in this paper to achieve real-time fault diagnosis.The critical parameters are discussed and verified by simulated and experimental signals for the sliding window data augmentation method.Based on LW-CNN and data augmentation,the real-time intelligent diagnosis of co-frequency is realized.Moreover,a real-time detection method of fault diagnosis algorithm is proposed for data acquisition to fault diagnosis.It is verified by experiments that the LW-CNN and sliding window methods are used with high accuracy and real-time performance.
基金This research was funded by the National Natural Science Foundation of China(Grant No.72001190)by the Ministry of Education’s Humanities and Social Science Project via the China Ministry of Education(Grant No.20YJC630173)by Zhejiang A&F University(Grant No.2022LFR062).
文摘Data stream clustering is integral to contemporary big data applications.However,addressing the ongoing influx of data streams efficiently and accurately remains a primary challenge in current research.This paper aims to elevate the efficiency and precision of data stream clustering,leveraging the TEDA(Typicality and Eccentricity Data Analysis)algorithm as a foundation,we introduce improvements by integrating a nearest neighbor search algorithm to enhance both the efficiency and accuracy of the algorithm.The original TEDA algorithm,grounded in the concept of“Typicality and Eccentricity Data Analytics”,represents an evolving and recursive method that requires no prior knowledge.While the algorithm autonomously creates and merges clusters as new data arrives,its efficiency is significantly hindered by the need to traverse all existing clusters upon the arrival of further data.This work presents the NS-TEDA(Neighbor Search Based Typicality and Eccentricity Data Analysis)algorithm by incorporating a KD-Tree(K-Dimensional Tree)algorithm integrated with the Scapegoat Tree.Upon arrival,this ensures that new data points interact solely with clusters in very close proximity.This significantly enhances algorithm efficiency while preventing a single data point from joining too many clusters and mitigating the merging of clusters with high overlap to some extent.We apply the NS-TEDA algorithm to several well-known datasets,comparing its performance with other data stream clustering algorithms and the original TEDA algorithm.The results demonstrate that the proposed algorithm achieves higher accuracy,and its runtime exhibits almost linear dependence on the volume of data,making it more suitable for large-scale data stream analysis research.
基金supported by CNPC-CZU Innovation Alliancesupported by the Program of Polar Drilling Environmental Protection and Waste Treatment Technology (2022YFC2806403)。
文摘In petroleum engineering,real-time lithology identification is very important for reservoir evaluation,drilling decisions and petroleum geological exploration.A lithology identification method while drilling based on machine learning and mud logging data is studied in this paper.This method can effectively utilize downhole parameters collected in real-time during drilling,to identify lithology in real-time and provide a reference for optimization of drilling parameters.Given the imbalance of lithology samples,the synthetic minority over-sampling technique(SMOTE)and Tomek link were used to balance the sample number of five lithologies.Meanwhile,this paper introduces Tent map,random opposition-based learning and dynamic perceived probability to the original crow search algorithm(CSA),and establishes an improved crow search algorithm(ICSA).In this paper,ICSA is used to optimize the hyperparameter combination of random forest(RF),extremely random trees(ET),extreme gradient boosting(XGB),and light gradient boosting machine(LGBM)models.In addition,this study combines the recognition advantages of the four models.The accuracy of lithology identification by the weighted average probability model reaches 0.877.The study of this paper realizes high-precision real-time lithology identification method,which can provide lithology reference for the drilling process.
基金This work is supported by the National Natural Science Foundation of China(Grant No.51991392)Key Deployment Projects of Chinese Academy of Sciences(Grant No.ZDRW-ZS-2021-3-3)the Second Tibetan Plateau Scientific Expedition and Research Program(STEP)(Grant No.2019QZKK0904).
文摘Predicting the mechanical behaviors of structure and perceiving the anomalies in advance are essential to ensuring the safe operation of infrastructures in the long run.In addition to the incomplete consideration of influencing factors,the prediction time scale of existing studies is rough.Therefore,this study focuses on the development of a real-time prediction model by coupling the spatio-temporal correlation with external load through autoencoder network(ATENet)based on structural health monitoring(SHM)data.An autoencoder mechanism is performed to acquire the high-level representation of raw monitoring data at different spatial positions,and the recurrent neural network is applied to understanding the temporal correlation from the time series.Then,the obtained temporal-spatial information is coupled with dynamic loads through a fully connected layer to predict structural performance in next 12 h.As a case study,the proposed model is formulated on the SHM data collected from a representative underwater shield tunnel.The robustness study is carried out to verify the reliability and the prediction capability of the proposed model.Finally,the ATENet model is compared with some typical models,and the results indicate that it has the best performance.ATENet model is of great value to predict the realtime evolution trend of tunnel structure.
基金National High-Tech Research and Development Program of China (863 Program) (No.2007AA01Z309)
文摘With the continual growth of the variety and complexity of network crime means, the traditional packet feature matching cannot detect all kinds of intrusion behaviors completely. It is urgent to reassemble network stream to perform packet processing at a semantic level above the network layer. This paper presents an efficient TCP stream reassembly mechanism for real-time processing of high-speed network traffic. By analyzing the characteristics of network stream in high-speed network and TCP connection establishment process, several polices for designing the reassembly mechanism are built. Then, the reassembly implementation is elaborated in accordance with the policies. Finally, the reassembly mechanism is compared with the traditional reassembly mechanism by the network traffic captured in a typical gigabit gateway. Experiment results illustrate that the reassembly mechanism is efficient and can satisfy the real-time property requirement of traffic analysis system in high-speed network.
文摘Considering the increasing use of information technology with established standards, such as TCP/IP and XML in modem industrial automation, we present a high cost performance solution with FPGA (field programmable gate array) implementation of a novel reliable real-time data transfer system based on EPA (Ethemet for plant automation) protocol and IEEE 1588 standard. This combination can provide more predictable and real-time communication between automation equipments and precise synchronization between devices. The designed EPA system has been verified on Xilinx Spartan3 XC3S1500 and it consumed 75% of the total slices. The experimental results show that the novel industrial control system achieves high synchronization precision and provides a 1.59-ps standard deviation between the master device and the slave ones. Such a real-time data transfer system is an excellent candidate for automation equipments which require precise synchronization based on Ethemet at a comparatively low price.
文摘Big data streams started becoming ubiquitous in recent years,thanks to rapid generation of massive volumes of data by different applications.It is challenging to apply existing data mining tools and techniques directly in these big data streams.At the same time,streaming data from several applications results in two major problems such as class imbalance and concept drift.The current research paper presents a new Multi-Objective Metaheuristic Optimization-based Big Data Analytics with Concept Drift Detection(MOMBD-CDD)method on High-Dimensional Streaming Data.The presented MOMBD-CDD model has different operational stages such as pre-processing,CDD,and classification.MOMBD-CDD model overcomes class imbalance problem by Synthetic Minority Over-sampling Technique(SMOTE).In order to determine the oversampling rates and neighboring point values of SMOTE,Glowworm Swarm Optimization(GSO)algorithm is employed.Besides,Statistical Test of Equal Proportions(STEPD),a CDD technique is also utilized.Finally,Bidirectional Long Short-Term Memory(Bi-LSTM)model is applied for classification.In order to improve classification performance and to compute the optimum parameters for Bi-LSTM model,GSO-based hyperparameter tuning process is carried out.The performance of the presented model was evaluated using high dimensional benchmark streaming datasets namely intrusion detection(NSL KDDCup)dataset and ECUE spam dataset.An extensive experimental validation process confirmed the effective outcome of MOMBD-CDD model.The proposed model attained high accuracy of 97.45%and 94.23%on the applied KDDCup99 Dataset and ECUE Spam datasets respectively.
基金Supported by the National Science and Technology Support Project(No.2012BAH01F02)from Ministry of Science and Technology of Chinathe Director Fund(No.IS201116002)from Institute of Seismology,CEA
文摘This paper designs and develops a framework on a distributed computing platform for massive multi-source spatial data using a column-oriented database(HBase).This platform consists of four layers including ETL(extraction transformation loading) tier,data processing tier,data storage tier and data display tier,achieving long-term store,real-time analysis and inquiry for massive data.Finally,a real dataset cluster is simulated,which are made up of 39 nodes including 2 master nodes and 37 data nodes,and performing function tests of data importing module and real-time query module,and performance tests of HDFS's I/O,the MapReduce cluster,batch-loading and real-time query of massive data.The test results indicate that this platform achieves high performance in terms of response time and linear scalability.
文摘Recently, use of mobile communicational devices in field data collection is increasing such as smart phones and cellular phones due to emergence of embedded Global Position System GPS and Wi-Fi Internet access. Accurate timely and handy field data collection is required for disaster management and emergency quick responses. In this article, we introduce web-based GIS system to collect the field data by personal mobile phone through Post Office Protocol POP3 mail server. The main objective of this work is to demonstrate real-time field data collection method to the students using their mobile phone to collect field data by timely and handy manners, either individual or group survey in local or global scale research.
基金Project(20030533011)supported by the National Research Foundation for the Doctoral Program of Higher Education of China
文摘A DMVOCC-MVDA (distributed multiversion optimistic concurrency control with multiversion dynamic adjustment) protocol was presented to process mobile distributed real-time transaction in mobile broadcast environments. At the mobile hosts, all transactions perform local pre-validation. The local pre-validation process is carried out against the committed transactions at the server in the last broadcast cycle. Transactions that survive in local pre-validation must be submitted to the server for local final validation. The new protocol eliminates conflicts between mobile read-only and mobile update transactions, and resolves data conflicts flexibly by using multiversion dynamic adjustment of serialization order to avoid unnecessary restarts of transactions. Mobile read-only transactions can be committed with no-blocking, and respond time of mobile read-only transactions is greatly shortened. The tolerance of mobile transactions of disconnections from the broadcast channel is increased. In global validation mobile distributed transactions have to do check to ensure distributed serializability in all participants. The simulation results show that the new concurrency control protocol proposed offers better performance than other protocols in terms of miss rate, restart rate, commit rate. Under high work load (think time is ls) the miss rate of DMVOCC-MVDA is only 14.6%, is significantly lower than that of other protocols. The restart rate of DMVOCC-MVDA is only 32.3%, showing that DMVOCC-MVDA can effectively reduce the restart rate of mobile transactions. And the commit rate of DMVOCC-MVDA is up to 61.2%, which is obviously higher than that of other protocols.
基金The National Natural Science Foundation of China (91438203,91638301,91438111,41601476).
文摘This paper focuses on the time efficiency for machine vision and intelligent photogrammetry, especially high accuracy on-board real-time cloud detection method. With the development of technology, the data acquisition ability is growing continuously and the volume of raw data is increasing explosively. Meanwhile, because of the higher requirement of data accuracy, the computation load is also becoming heavier. This situation makes time efficiency extremely important. Moreover, the cloud cover rate of optical satellite imagery is up to approximately 50%, which is seriously restricting the applications of on-board intelligent photogrammetry services. To meet the on-board cloud detection requirements and offer valid input data to subsequent processing, this paper presents a stream-computing of high accuracy on-board real-time cloud detection solution which follows the “bottom-up” understanding strategy of machine vision and uses multiple embedded GPU with significant potential to be applied on-board. Without external memory, the data parallel pipeline system based on multiple processing modules of this solution could afford the “stream-in, processing, stream-out” real-time stream computing. In experiments, images of GF-2 satellite are used to validate the accuracy and performance of this approach, and the experimental results show that this solution could not only bring up cloud detection accuracy, but also match the on-board real-time processing requirements.