Due to the limited scenes that synthetic aperture radar(SAR)satellites can detect,the full-track utilization rate is not high.Because of the computing and storage limitation of one satellite,it is difficult to process...Due to the limited scenes that synthetic aperture radar(SAR)satellites can detect,the full-track utilization rate is not high.Because of the computing and storage limitation of one satellite,it is difficult to process large amounts of data of spaceborne synthetic aperture radars.It is proposed to use a new method of networked satellite data processing for improving the efficiency of data processing.A multi-satellite distributed SAR real-time processing method based on Chirp Scaling(CS)imaging algorithm is studied in this paper,and a distributed data processing system is built with field programmable gate array(FPGA)chips as the kernel.Different from the traditional CS algorithm processing,the system divides data processing into three stages.The computing tasks are reasonably allocated to different data processing units(i.e.,satellites)in each stage.The method effectively saves computing and storage resources of satellites,improves the utilization rate of a single satellite,and shortens the data processing time.Gaofen-3(GF-3)satellite SAR raw data is processed by the system,with the performance of the method verified.展开更多
Due to the restricted satellite payloads in LEO mega-constellation networks(LMCNs),remote sensing image analysis,online learning and other big data services desirably need onboard distributed processing(OBDP).In exist...Due to the restricted satellite payloads in LEO mega-constellation networks(LMCNs),remote sensing image analysis,online learning and other big data services desirably need onboard distributed processing(OBDP).In existing technologies,the efficiency of big data applications(BDAs)in distributed systems hinges on the stable-state and low-latency links between worker nodes.However,LMCNs with high-dynamic nodes and long-distance links can not provide the above conditions,which makes the performance of OBDP hard to be intuitively measured.To bridge this gap,a multidimensional simulation platform is indispensable that can simulate the network environment of LMCNs and put BDAs in it for performance testing.Using STK's APIs and parallel computing framework,we achieve real-time simulation for thousands of satellite nodes,which are mapped as application nodes through software defined network(SDN)and container technologies.We elaborate the architecture and mechanism of the simulation platform,and take the Starlink and Hadoop as realistic examples for simulations.The results indicate that LMCNs have dynamic end-to-end latency which fluctuates periodically with the constellation movement.Compared to ground data center networks(GDCNs),LMCNs deteriorate the computing and storage job throughput,which can be alleviated by the utilization of erasure codes and data flow scheduling of worker nodes.展开更多
A DMVOCC-MVDA (distributed multiversion optimistic concurrency control with multiversion dynamic adjustment) protocol was presented to process mobile distributed real-time transaction in mobile broadcast environment...A DMVOCC-MVDA (distributed multiversion optimistic concurrency control with multiversion dynamic adjustment) protocol was presented to process mobile distributed real-time transaction in mobile broadcast environments. At the mobile hosts, all transactions perform local pre-validation. The local pre-validation process is carried out against the committed transactions at the server in the last broadcast cycle. Transactions that survive in local pre-validation must be submitted to the server for local final validation. The new protocol eliminates conflicts between mobile read-only and mobile update transactions, and resolves data conflicts flexibly by using multiversion dynamic adjustment of serialization order to avoid unnecessary restarts of transactions. Mobile read-only transactions can be committed with no-blocking, and respond time of mobile read-only transactions is greatly shortened. The tolerance of mobile transactions of disconnections from the broadcast channel is increased. In global validation mobile distributed transactions have to do check to ensure distributed serializability in all participants. The simulation results show that the new concurrency control protocol proposed offers better performance than other protocols in terms of miss rate, restart rate, commit rate. Under high work load (think time is ls) the miss rate of DMVOCC-MVDA is only 14.6%, is significantly lower than that of other protocols. The restart rate of DMVOCC-MVDA is only 32.3%, showing that DMVOCC-MVDA can effectively reduce the restart rate of mobile transactions. And the commit rate of DMVOCC-MVDA is up to 61.2%, which is obviously higher than that of other protocols.展开更多
A distributed processing system (DPS) contains many autonomous nodes, which contribute their own computing power. DPS is considered a unified logical structure, operating in a distributed manner;the processing tasks a...A distributed processing system (DPS) contains many autonomous nodes, which contribute their own computing power. DPS is considered a unified logical structure, operating in a distributed manner;the processing tasks are divided into fragments and assigned to various nodes for processing. That type of operation requires and involves a great deal of communication. We propose to use the decentralized approach, based on a distributed hash table, to reduce the communication overhead and remove the server unit, thus avoiding having a single point of failure in the system. This paper proposes a mathematical model and algorithms that are implemented in a dedicated experimental system. Using the decentralized approach, this study demonstrates the efficient operation of a decentralized system which results in a reduced energy emission.展开更多
Glacier disasters occur frequently in alpine regions around the world,but the current conventional geological disaster measurement technology cannot be directly used for glacier disaster measurement.Hence,in this stud...Glacier disasters occur frequently in alpine regions around the world,but the current conventional geological disaster measurement technology cannot be directly used for glacier disaster measurement.Hence,in this study,a distributed multi-sensor measurement system for glacier deformation was established by integrating piezoelectric sensing,coded sensing,attitude sensing technology and wireless communication technology.The traditional Modbus protocol was optimized to solve the problem of data identification confusion of different acquisition nodes.Through indoor wireless transmission,adaptive performance analysis,error measurement experiment and landslide simulation experiment,the performance of the measurement system was analyzed and evaluated.Using unmanned aerial vehicle technology,the reliability and effectiveness of the measurement system were verified on the site of Galongla glacier in southeastern Tibet,China.The results show that the mean absolute percentage errors were only 1.13%and 2.09%for the displacement and temperature,respectively.The distributed glacier deformation real-time measurement system provides a new means for the assessment of the development process of glacier disasters and disaster prevention and mitigation.展开更多
Low-field(nuclear magnetic resonance)NMR has been widely used in petroleum industry,such as well logging and laboratory rock core analysis.However,the signal-to-noise ratio is low due to the low magnetic field strengt...Low-field(nuclear magnetic resonance)NMR has been widely used in petroleum industry,such as well logging and laboratory rock core analysis.However,the signal-to-noise ratio is low due to the low magnetic field strength of NMR tools and the complex petrophysical properties of detected samples.Suppressing the noise and highlighting the available NMR signals is very important for subsequent data processing.Most denoising methods are normally based on fixed mathematical transformation or handdesign feature selectors to suppress noise characteristics,which may not perform well because of their non-adaptive performance to different noisy signals.In this paper,we proposed a“data processing framework”to improve the quality of low field NMR echo data based on dictionary learning.Dictionary learning is a machine learning method based on redundancy and sparse representation theory.Available information in noisy NMR echo data can be adaptively extracted and reconstructed by dictionary learning.The advantages and application effectiveness of the proposed method were verified with a number of numerical simulations,NMR core data analyses,and NMR logging data processing.The results show that dictionary learning can significantly improve the quality of NMR echo data with high noise level and effectively improve the accuracy and reliability of inversion results.展开更多
Due to the increasing number of cloud applications,the amount of data in the cloud shows signs of growing faster than ever before.The nature of cloud computing requires cloud data processing systems that can handle hu...Due to the increasing number of cloud applications,the amount of data in the cloud shows signs of growing faster than ever before.The nature of cloud computing requires cloud data processing systems that can handle huge volumes of data and have high performance.However,most cloud storage systems currently adopt a hash-like approach to retrieving data that only supports simple keyword-based enquiries,but lacks various forms of information search.Therefore,a scalable and efficient indexing scheme is clearly required.In this paper,we present a skip list-based cloud index,called SLC-index,which is a novel,scalable skip list-based indexing for cloud data processing.The SLC-index offers a two-layered architecture for extending indexing scope and facilitating better throughput.Dynamic load-balancing for the SLC-index is achieved by online migration of index nodes between servers.Furthermore,it is a flexible system due to its dynamic addition and removal of servers.The SLC-index is efficient for both point and range queries.Experimental results show the efficiency of the SLC-index and its usefulness as an alternative approach for cloud-suitable data structures.展开更多
This paper designs and develops a framework on a distributed computing platform for massive multi-source spatial data using a column-oriented database(HBase).This platform consists of four layers including ETL(extract...This paper designs and develops a framework on a distributed computing platform for massive multi-source spatial data using a column-oriented database(HBase).This platform consists of four layers including ETL(extraction transformation loading) tier,data processing tier,data storage tier and data display tier,achieving long-term store,real-time analysis and inquiry for massive data.Finally,a real dataset cluster is simulated,which are made up of 39 nodes including 2 master nodes and 37 data nodes,and performing function tests of data importing module and real-time query module,and performance tests of HDFS's I/O,the MapReduce cluster,batch-loading and real-time query of massive data.The test results indicate that this platform achieves high performance in terms of response time and linear scalability.展开更多
It is crucial,while using healthcare data,to assess the advantages of data privacy against the possible drawbacks.Data from several sources must be combined for use in many data mining applications.The medical practit...It is crucial,while using healthcare data,to assess the advantages of data privacy against the possible drawbacks.Data from several sources must be combined for use in many data mining applications.The medical practitioner may use the results of association rule mining performed on this aggregated data to better personalize patient care and implement preventive measures.Historically,numerous heuristics(e.g.,greedy search)and metaheuristics-based techniques(e.g.,evolutionary algorithm)have been created for the positive association rule in privacy preserving data mining(PPDM).When it comes to connecting seemingly unrelated diseases and drugs,negative association rules may be more informative than their positive counterparts.It is well-known that during negative association rules mining,a large number of uninteresting rules are formed,making this a difficult problem to tackle.In this research,we offer an adaptive method for negative association rule mining in vertically partitioned healthcare datasets that respects users’privacy.The applied approach dynamically determines the transactions to be interrupted for information hiding,as opposed to predefining them.This study introduces a novel method for addressing the problem of negative association rules in healthcare data mining,one that is based on the Tabu-genetic optimization paradigm.Tabu search is advantageous since it removes a huge number of unnecessary rules and item sets.Experiments using benchmark healthcare datasets prove that the discussed scheme outperforms state-of-the-art solutions in terms of decreasing side effects and data distortions,as measured by the indicator of hiding failure.展开更多
In wastewater treatment process(WWTP), the accurate and real-time monitoring values of key variables are crucial for the operational strategies. However, most of the existing methods have difficulty in obtaining the r...In wastewater treatment process(WWTP), the accurate and real-time monitoring values of key variables are crucial for the operational strategies. However, most of the existing methods have difficulty in obtaining the real-time values of some key variables in the process. In order to handle this issue, a data-driven intelligent monitoring system, using the soft sensor technique and data distribution service, is developed to monitor the concentrations of effluent total phosphorous(TP) and ammonia nitrogen(NH_4-N). In this intelligent monitoring system, a fuzzy neural network(FNN) is applied for designing the soft sensor model, and a principal component analysis(PCA) method is used to select the input variables of the soft sensor model. Moreover, data transfer software is exploited to insert the soft sensor technique to the supervisory control and data acquisition(SCADA) system. Finally, this proposed intelligent monitoring system is tested in several real plants to demonstrate the reliability and effectiveness of the monitoring performance.展开更多
A new method of establishing rolling load distribution model was developed by online intelligent information-processing technology for plate rolling. The model combines knowledge model and mathematical model with usin...A new method of establishing rolling load distribution model was developed by online intelligent information-processing technology for plate rolling. The model combines knowledge model and mathematical model with using knowledge discovery in database (KDD) and data mining (DM) as the start. The online maintenance and optimization of the load model are realized. The effectiveness of this new method was testified by offline simulation and online application.展开更多
A rapidly deployable dense seismic monitoring system which is capable of transmitting acquired data in real time and analyzing data automatically is crucial in seismic hazard mitigation after a major earthquake.Howeve...A rapidly deployable dense seismic monitoring system which is capable of transmitting acquired data in real time and analyzing data automatically is crucial in seismic hazard mitigation after a major earthquake.However,it is rather difficult for current seismic nodal stations to transmit data in real time for an extended period of time,and it usually takes a great amount of time to process the acquired data manually.To monitor earthquakes in real time flexibly,we develop a mobile integrated seismic monitoring system consisting of newly developed nodal units with 4G telemetry and a real-time AI-assisted automatic data processing workflow.The integrated system is convenient for deployment and has been successfully applied in monitoring the aftershocks of the Yangbi M_(S) 6.4 earthquake occurred on May 21,2021 in Yangbi County,Dali,Yunnan in southwest China.The acquired seismic data are transmitted almost in real time through the 4G cellular network,and then processed automat-ically for event detection,positioning,magnitude calculation and source mechanism inversion.From tens of seconds to a couple of minutes at most,the final seismic attributes can be presented remotely to the end users through the integrated system.From May 27 to June 17,the real-time system has detected and located 7905 aftershocks in the Yangbi area before the internal batteries exhausted,far more than the catalog provided by China Earthquake Networks Center using the regional permanent stations.The initial application of this inte-grated real-time monitoring system is promising,and we anticipate the advent of a new era for Real-time Intelligent Array Seismology(RIAS),for better monitoring and understanding the subsurface dynamic pro-cesses caused by Earth's internal forces as well as anthropogenic activities.展开更多
This work applies non-stationary random processes to resilience of power distribution under severe weather. Power distribution, the edge of the energy infrastructure, is susceptible to external hazards from severe wea...This work applies non-stationary random processes to resilience of power distribution under severe weather. Power distribution, the edge of the energy infrastructure, is susceptible to external hazards from severe weather. Large-scale power failures often occur, resulting in millions of people without electricity for days. However, the problem of large-scale power failure, recovery and resilience has not been formulated rigorously nor studied systematically. This work studies the resilience of power distribution from three aspects. First, we derive non-stationary random processes to model large-scale failures and recoveries. Transient Little’s Law then provides a simple approximation of the entire life cycle of failure and recovery through a queue at the network-level. Second, we define time-varying resilience based on the non-stationary model. The resilience metric characterizes the ability of power distribution to remain operational and recover rapidly upon failures. Third, we apply the non-stationary model and the resilience metric to large-scale power failures caused by Hurricane Ike. We use the real data from the electric grid to learn time-varying model parameters and the resilience metric. Our results show non-stationary evolution of failure rates and recovery times, and how the network resilience deviates from that of normal operation during the hurricane.展开更多
In this paper, we introduce a system architecture for a patient centered mobile health monitoring (PCMHM) system that deploys different sensors to determine patients’ activities, medical conditions, and the cause of ...In this paper, we introduce a system architecture for a patient centered mobile health monitoring (PCMHM) system that deploys different sensors to determine patients’ activities, medical conditions, and the cause of an emergency event. This system combines and analyzes sensor data to produce the patients’ detailed health information in real-time. A central computational node with data analyzing capability is used for sensor data integration and analysis. In addition to medical sensors, surrounding environmental sensors are also utilized to enhance the interpretation of the data and to improve medical diagnosis. The PCMHM system has the ability to provide on-demand health information of patients via the Internet, track real-time daily activities and patients’ health condition. This system also includes the capability for assessing patients’ posture and fall detection.展开更多
The scale and complexity of big data are growing continuously,posing severe challenges to traditional data processing methods,especially in the field of clustering analysis.To address this issue,this paper introduces ...The scale and complexity of big data are growing continuously,posing severe challenges to traditional data processing methods,especially in the field of clustering analysis.To address this issue,this paper introduces a new method named Big Data Tensor Multi-Cluster Distributed Incremental Update(BDTMCDIncreUpdate),which combines distributed computing,storage technology,and incremental update techniques to provide an efficient and effective means for clustering analysis.Firstly,the original dataset is divided into multiple subblocks,and distributed computing resources are utilized to process the sub-blocks in parallel,enhancing efficiency.Then,initial clustering is performed on each sub-block using tensor-based multi-clustering techniques to obtain preliminary results.When new data arrives,incremental update technology is employed to update the core tensor and factor matrix,ensuring that the clustering model can adapt to changes in data.Finally,by combining the updated core tensor and factor matrix with historical computational results,refined clustering results are obtained,achieving real-time adaptation to dynamic data.Through experimental simulation on the Aminer dataset,the BDTMCDIncreUpdate method has demonstrated outstanding performance in terms of accuracy(ACC)and normalized mutual information(NMI)metrics,achieving an accuracy rate of 90%and an NMI score of 0.85,which outperforms existing methods such as TClusInitUpdate and TKLClusUpdate in most scenarios.Therefore,the BDTMCDIncreUpdate method offers an innovative solution to the field of big data analysis,integrating distributed computing,incremental updates,and tensor-based multi-clustering techniques.It not only improves the efficiency and scalability in processing large-scale high-dimensional datasets but also has been validated for its effectiveness and accuracy through experiments.This method shows great potential in real-world applications where dynamic data growth is common,and it is of significant importance for advancing the development of data analysis technology.展开更多
In this study, we delve into the realm of efficient Big Data Engineering and Extract, Transform, Load (ETL) processes within the healthcare sector, leveraging the robust foundation provided by the MIMIC-III Clinical D...In this study, we delve into the realm of efficient Big Data Engineering and Extract, Transform, Load (ETL) processes within the healthcare sector, leveraging the robust foundation provided by the MIMIC-III Clinical Database. Our investigation entails a comprehensive exploration of various methodologies aimed at enhancing the efficiency of ETL processes, with a primary emphasis on optimizing time and resource utilization. Through meticulous experimentation utilizing a representative dataset, we shed light on the advantages associated with the incorporation of PySpark and Docker containerized applications. Our research illuminates significant advancements in time efficiency, process streamlining, and resource optimization attained through the utilization of PySpark for distributed computing within Big Data Engineering workflows. Additionally, we underscore the strategic integration of Docker containers, delineating their pivotal role in augmenting scalability and reproducibility within the ETL pipeline. This paper encapsulates the pivotal insights gleaned from our experimental journey, accentuating the practical implications and benefits entailed in the adoption of PySpark and Docker. By streamlining Big Data Engineering and ETL processes in the context of clinical big data, our study contributes to the ongoing discourse on optimizing data processing efficiency in healthcare applications. The source code is available on request.展开更多
High-resolution vehicular emissions inventories are important for managing vehicular pollution and improving urban air quality. This study developed a vehicular emission inventory with high spatio-temporal resolution ...High-resolution vehicular emissions inventories are important for managing vehicular pollution and improving urban air quality. This study developed a vehicular emission inventory with high spatio-temporal resolution in the main urban area of Chongqing, based on realtime traffic data from 820 RFID detectors covering 454 roads, and the differences in spatiotemporal emission characteristics between inner and outer districts were analysed. The result showed that the daily vehicular emission intensities of CO, hydrocarbons, PM2.5, PM10,and NO_(x) were 30.24, 3.83, 0.18, 0.20, and 8.65 kg/km per day, respectively, in the study area during 2018. The pollutants emission intensities in inner district were higher than those in outer district. Light passenger cars(LPCs) were the main contributors of all-day CO emissions in the inner and outer districts, from which the contributors of NO_(x) emissions were different. Diesel and natural gas buses were major contributors of daytime NO_(x) emissions in inner districts, accounting for 40.40%, but buses and heavy duty trucks(HDTs) were major contributors in outer districts. At nighttime, due to the lifting of truck restrictions and suspension of buses, HDTs become the main NO_(x) contributor in both inner and outer districts,and its three NO_(x) emission peak hours were found, which are different to the peak hours of total NO_(x) emission by all vehicles. Unlike most other cities, bridges and connecting channels are always emission hotspots due to long-time traffic congestion. This knowledge will help fully understand vehicular emissions characteristics and is useful for policymakers to design precise prevention and control measures.展开更多
Nowadays, there exist numerous images in the Internet, and with the development ot cloud compuung ano big data applications, many of those images need to be processed for different kinds of applications by using speci...Nowadays, there exist numerous images in the Internet, and with the development ot cloud compuung ano big data applications, many of those images need to be processed for different kinds of applications by using specific image processing algorithms. Meanwhile, there already exist many kinds of image processing algorithms and their variations, while new algorithms are still emerging. Consequently, an ongoing problem is how to improve the efficiency of massive image processing and support the integration of existing implementations of image processing algorithms into the systems. This paper proposes a distributed image processing system named SEIP, which is built on Hadoop, and employs extensible in- node architecture to support various kinds of image processing algorithms on distributed platforms with GPU accelerators. The system also uses a pipeline-based h'amework to accelerate massive image file processing. A demonstration application for image feature extraction is designed. The system is evaluated in a small-scale Hadoop cluster with GPU accelerators, and the experimental results show the usability and efficiency of SEIP.展开更多
Data processing is a basic and crucial factor in seismic exploration,which can influence the effect of subsequent processing directly. Thus the selection of appropriate method for data processing is one of the most im...Data processing is a basic and crucial factor in seismic exploration,which can influence the effect of subsequent processing directly. Thus the selection of appropriate method for data processing is one of the most important tasks throughout the work. By simulating,the authors analyze and compare Fractional Fourier Transform( FRFT) and Wigner-Ville distribution( WVD),then summarize the similarities and advantages and disadvantages of the two methods. The results reveal that FRFT is more effective and suitable for application in seismic exploration than WVD.展开更多
基金Project(2017YFC1405600)supported by the National Key R&D Program of ChinaProject(18JK05032)supported by the Scientific Research Project of Education Department of Shaanxi Province,China。
文摘Due to the limited scenes that synthetic aperture radar(SAR)satellites can detect,the full-track utilization rate is not high.Because of the computing and storage limitation of one satellite,it is difficult to process large amounts of data of spaceborne synthetic aperture radars.It is proposed to use a new method of networked satellite data processing for improving the efficiency of data processing.A multi-satellite distributed SAR real-time processing method based on Chirp Scaling(CS)imaging algorithm is studied in this paper,and a distributed data processing system is built with field programmable gate array(FPGA)chips as the kernel.Different from the traditional CS algorithm processing,the system divides data processing into three stages.The computing tasks are reasonably allocated to different data processing units(i.e.,satellites)in each stage.The method effectively saves computing and storage resources of satellites,improves the utilization rate of a single satellite,and shortens the data processing time.Gaofen-3(GF-3)satellite SAR raw data is processed by the system,with the performance of the method verified.
基金supported by National Natural Sciences Foundation of China(No.62271165,62027802,62201307)the Guangdong Basic and Applied Basic Research Foundation(No.2023A1515030297)+2 种基金the Shenzhen Science and Technology Program ZDSYS20210623091808025Stable Support Plan Program GXWD20231129102638002the Major Key Project of PCL(No.PCL2024A01)。
文摘Due to the restricted satellite payloads in LEO mega-constellation networks(LMCNs),remote sensing image analysis,online learning and other big data services desirably need onboard distributed processing(OBDP).In existing technologies,the efficiency of big data applications(BDAs)in distributed systems hinges on the stable-state and low-latency links between worker nodes.However,LMCNs with high-dynamic nodes and long-distance links can not provide the above conditions,which makes the performance of OBDP hard to be intuitively measured.To bridge this gap,a multidimensional simulation platform is indispensable that can simulate the network environment of LMCNs and put BDAs in it for performance testing.Using STK's APIs and parallel computing framework,we achieve real-time simulation for thousands of satellite nodes,which are mapped as application nodes through software defined network(SDN)and container technologies.We elaborate the architecture and mechanism of the simulation platform,and take the Starlink and Hadoop as realistic examples for simulations.The results indicate that LMCNs have dynamic end-to-end latency which fluctuates periodically with the constellation movement.Compared to ground data center networks(GDCNs),LMCNs deteriorate the computing and storage job throughput,which can be alleviated by the utilization of erasure codes and data flow scheduling of worker nodes.
基金Project(20030533011)supported by the National Research Foundation for the Doctoral Program of Higher Education of China
文摘A DMVOCC-MVDA (distributed multiversion optimistic concurrency control with multiversion dynamic adjustment) protocol was presented to process mobile distributed real-time transaction in mobile broadcast environments. At the mobile hosts, all transactions perform local pre-validation. The local pre-validation process is carried out against the committed transactions at the server in the last broadcast cycle. Transactions that survive in local pre-validation must be submitted to the server for local final validation. The new protocol eliminates conflicts between mobile read-only and mobile update transactions, and resolves data conflicts flexibly by using multiversion dynamic adjustment of serialization order to avoid unnecessary restarts of transactions. Mobile read-only transactions can be committed with no-blocking, and respond time of mobile read-only transactions is greatly shortened. The tolerance of mobile transactions of disconnections from the broadcast channel is increased. In global validation mobile distributed transactions have to do check to ensure distributed serializability in all participants. The simulation results show that the new concurrency control protocol proposed offers better performance than other protocols in terms of miss rate, restart rate, commit rate. Under high work load (think time is ls) the miss rate of DMVOCC-MVDA is only 14.6%, is significantly lower than that of other protocols. The restart rate of DMVOCC-MVDA is only 32.3%, showing that DMVOCC-MVDA can effectively reduce the restart rate of mobile transactions. And the commit rate of DMVOCC-MVDA is up to 61.2%, which is obviously higher than that of other protocols.
文摘A distributed processing system (DPS) contains many autonomous nodes, which contribute their own computing power. DPS is considered a unified logical structure, operating in a distributed manner;the processing tasks are divided into fragments and assigned to various nodes for processing. That type of operation requires and involves a great deal of communication. We propose to use the decentralized approach, based on a distributed hash table, to reduce the communication overhead and remove the server unit, thus avoiding having a single point of failure in the system. This paper proposes a mathematical model and algorithms that are implemented in a dedicated experimental system. Using the decentralized approach, this study demonstrates the efficient operation of a decentralized system which results in a reduced energy emission.
基金funded by National Key R&D Program of China((Nos.2022YFC3003403 and 2018YFC1505203)Key Research and Development Program of Tibet Autonomous Region(XZ202301ZY0039G)+1 种基金Natural Science Foundation of Hebei Province(No.F2021201031)Geological Survey Project of China Geological Survey(No.DD20221747)。
文摘Glacier disasters occur frequently in alpine regions around the world,but the current conventional geological disaster measurement technology cannot be directly used for glacier disaster measurement.Hence,in this study,a distributed multi-sensor measurement system for glacier deformation was established by integrating piezoelectric sensing,coded sensing,attitude sensing technology and wireless communication technology.The traditional Modbus protocol was optimized to solve the problem of data identification confusion of different acquisition nodes.Through indoor wireless transmission,adaptive performance analysis,error measurement experiment and landslide simulation experiment,the performance of the measurement system was analyzed and evaluated.Using unmanned aerial vehicle technology,the reliability and effectiveness of the measurement system were verified on the site of Galongla glacier in southeastern Tibet,China.The results show that the mean absolute percentage errors were only 1.13%and 2.09%for the displacement and temperature,respectively.The distributed glacier deformation real-time measurement system provides a new means for the assessment of the development process of glacier disasters and disaster prevention and mitigation.
基金supported by Science Foundation of China University of Petroleum,Beijing(Grant Number ZX20210024)Chinese Postdoctoral Science Foundation(Grant Number 2021M700172)+1 种基金The Strategic Cooperation Technology Projects of CNPC and CUP(Grant Number ZLZX2020-03)National Natural Science Foundation of China(Grant Number 42004105)
文摘Low-field(nuclear magnetic resonance)NMR has been widely used in petroleum industry,such as well logging and laboratory rock core analysis.However,the signal-to-noise ratio is low due to the low magnetic field strength of NMR tools and the complex petrophysical properties of detected samples.Suppressing the noise and highlighting the available NMR signals is very important for subsequent data processing.Most denoising methods are normally based on fixed mathematical transformation or handdesign feature selectors to suppress noise characteristics,which may not perform well because of their non-adaptive performance to different noisy signals.In this paper,we proposed a“data processing framework”to improve the quality of low field NMR echo data based on dictionary learning.Dictionary learning is a machine learning method based on redundancy and sparse representation theory.Available information in noisy NMR echo data can be adaptively extracted and reconstructed by dictionary learning.The advantages and application effectiveness of the proposed method were verified with a number of numerical simulations,NMR core data analyses,and NMR logging data processing.The results show that dictionary learning can significantly improve the quality of NMR echo data with high noise level and effectively improve the accuracy and reliability of inversion results.
基金Projects(61363021,61540061,61663047)supported by the National Natural Science Foundation of ChinaProject(2017SE206)supported by the Open Foundation of Key Laboratory in Software Engineering of Yunnan Province,China
文摘Due to the increasing number of cloud applications,the amount of data in the cloud shows signs of growing faster than ever before.The nature of cloud computing requires cloud data processing systems that can handle huge volumes of data and have high performance.However,most cloud storage systems currently adopt a hash-like approach to retrieving data that only supports simple keyword-based enquiries,but lacks various forms of information search.Therefore,a scalable and efficient indexing scheme is clearly required.In this paper,we present a skip list-based cloud index,called SLC-index,which is a novel,scalable skip list-based indexing for cloud data processing.The SLC-index offers a two-layered architecture for extending indexing scope and facilitating better throughput.Dynamic load-balancing for the SLC-index is achieved by online migration of index nodes between servers.Furthermore,it is a flexible system due to its dynamic addition and removal of servers.The SLC-index is efficient for both point and range queries.Experimental results show the efficiency of the SLC-index and its usefulness as an alternative approach for cloud-suitable data structures.
基金Supported by the National Science and Technology Support Project(No.2012BAH01F02)from Ministry of Science and Technology of Chinathe Director Fund(No.IS201116002)from Institute of Seismology,CEA
文摘This paper designs and develops a framework on a distributed computing platform for massive multi-source spatial data using a column-oriented database(HBase).This platform consists of four layers including ETL(extraction transformation loading) tier,data processing tier,data storage tier and data display tier,achieving long-term store,real-time analysis and inquiry for massive data.Finally,a real dataset cluster is simulated,which are made up of 39 nodes including 2 master nodes and 37 data nodes,and performing function tests of data importing module and real-time query module,and performance tests of HDFS's I/O,the MapReduce cluster,batch-loading and real-time query of massive data.The test results indicate that this platform achieves high performance in terms of response time and linear scalability.
文摘It is crucial,while using healthcare data,to assess the advantages of data privacy against the possible drawbacks.Data from several sources must be combined for use in many data mining applications.The medical practitioner may use the results of association rule mining performed on this aggregated data to better personalize patient care and implement preventive measures.Historically,numerous heuristics(e.g.,greedy search)and metaheuristics-based techniques(e.g.,evolutionary algorithm)have been created for the positive association rule in privacy preserving data mining(PPDM).When it comes to connecting seemingly unrelated diseases and drugs,negative association rules may be more informative than their positive counterparts.It is well-known that during negative association rules mining,a large number of uninteresting rules are formed,making this a difficult problem to tackle.In this research,we offer an adaptive method for negative association rule mining in vertically partitioned healthcare datasets that respects users’privacy.The applied approach dynamically determines the transactions to be interrupted for information hiding,as opposed to predefining them.This study introduces a novel method for addressing the problem of negative association rules in healthcare data mining,one that is based on the Tabu-genetic optimization paradigm.Tabu search is advantageous since it removes a huge number of unnecessary rules and item sets.Experiments using benchmark healthcare datasets prove that the discussed scheme outperforms state-of-the-art solutions in terms of decreasing side effects and data distortions,as measured by the indicator of hiding failure.
基金Supported by the National Natural Science Foundation of China(61622301,61533002)Beijing Natural Science Foundation(4172005)Major National Science and Technology Project(2017ZX07104)
文摘In wastewater treatment process(WWTP), the accurate and real-time monitoring values of key variables are crucial for the operational strategies. However, most of the existing methods have difficulty in obtaining the real-time values of some key variables in the process. In order to handle this issue, a data-driven intelligent monitoring system, using the soft sensor technique and data distribution service, is developed to monitor the concentrations of effluent total phosphorous(TP) and ammonia nitrogen(NH_4-N). In this intelligent monitoring system, a fuzzy neural network(FNN) is applied for designing the soft sensor model, and a principal component analysis(PCA) method is used to select the input variables of the soft sensor model. Moreover, data transfer software is exploited to insert the soft sensor technique to the supervisory control and data acquisition(SCADA) system. Finally, this proposed intelligent monitoring system is tested in several real plants to demonstrate the reliability and effectiveness of the monitoring performance.
文摘A new method of establishing rolling load distribution model was developed by online intelligent information-processing technology for plate rolling. The model combines knowledge model and mathematical model with using knowledge discovery in database (KDD) and data mining (DM) as the start. The online maintenance and optimization of the load model are realized. The effectiveness of this new method was testified by offline simulation and online application.
基金supported by the National Natural Science Foundation of China (under grants 41874048,41790464,41790462).
文摘A rapidly deployable dense seismic monitoring system which is capable of transmitting acquired data in real time and analyzing data automatically is crucial in seismic hazard mitigation after a major earthquake.However,it is rather difficult for current seismic nodal stations to transmit data in real time for an extended period of time,and it usually takes a great amount of time to process the acquired data manually.To monitor earthquakes in real time flexibly,we develop a mobile integrated seismic monitoring system consisting of newly developed nodal units with 4G telemetry and a real-time AI-assisted automatic data processing workflow.The integrated system is convenient for deployment and has been successfully applied in monitoring the aftershocks of the Yangbi M_(S) 6.4 earthquake occurred on May 21,2021 in Yangbi County,Dali,Yunnan in southwest China.The acquired seismic data are transmitted almost in real time through the 4G cellular network,and then processed automat-ically for event detection,positioning,magnitude calculation and source mechanism inversion.From tens of seconds to a couple of minutes at most,the final seismic attributes can be presented remotely to the end users through the integrated system.From May 27 to June 17,the real-time system has detected and located 7905 aftershocks in the Yangbi area before the internal batteries exhausted,far more than the catalog provided by China Earthquake Networks Center using the regional permanent stations.The initial application of this inte-grated real-time monitoring system is promising,and we anticipate the advent of a new era for Real-time Intelligent Array Seismology(RIAS),for better monitoring and understanding the subsurface dynamic pro-cesses caused by Earth's internal forces as well as anthropogenic activities.
文摘This work applies non-stationary random processes to resilience of power distribution under severe weather. Power distribution, the edge of the energy infrastructure, is susceptible to external hazards from severe weather. Large-scale power failures often occur, resulting in millions of people without electricity for days. However, the problem of large-scale power failure, recovery and resilience has not been formulated rigorously nor studied systematically. This work studies the resilience of power distribution from three aspects. First, we derive non-stationary random processes to model large-scale failures and recoveries. Transient Little’s Law then provides a simple approximation of the entire life cycle of failure and recovery through a queue at the network-level. Second, we define time-varying resilience based on the non-stationary model. The resilience metric characterizes the ability of power distribution to remain operational and recover rapidly upon failures. Third, we apply the non-stationary model and the resilience metric to large-scale power failures caused by Hurricane Ike. We use the real data from the electric grid to learn time-varying model parameters and the resilience metric. Our results show non-stationary evolution of failure rates and recovery times, and how the network resilience deviates from that of normal operation during the hurricane.
文摘In this paper, we introduce a system architecture for a patient centered mobile health monitoring (PCMHM) system that deploys different sensors to determine patients’ activities, medical conditions, and the cause of an emergency event. This system combines and analyzes sensor data to produce the patients’ detailed health information in real-time. A central computational node with data analyzing capability is used for sensor data integration and analysis. In addition to medical sensors, surrounding environmental sensors are also utilized to enhance the interpretation of the data and to improve medical diagnosis. The PCMHM system has the ability to provide on-demand health information of patients via the Internet, track real-time daily activities and patients’ health condition. This system also includes the capability for assessing patients’ posture and fall detection.
基金sponsored by the National Natural Science Foundation of China(Nos.61972208,62102194 and 62102196)National Natural Science Foundation of China(Youth Project)(No.62302237)+3 种基金Six Talent Peaks Project of Jiangsu Province(No.RJFW-111),China Postdoctoral Science Foundation Project(No.2018M640509)Postgraduate Research and Practice Innovation Program of Jiangsu Province(Nos.KYCX22_1019,KYCX23_1087,KYCX22_1027,KYCX23_1087,SJCX24_0339 and SJCX24_0346)Innovative Training Program for College Students of Nanjing University of Posts and Telecommunications(No.XZD2019116)Nanjing University of Posts and Telecommunications College Students Innovation Training Program(Nos.XZD2019116,XYB2019331).
文摘The scale and complexity of big data are growing continuously,posing severe challenges to traditional data processing methods,especially in the field of clustering analysis.To address this issue,this paper introduces a new method named Big Data Tensor Multi-Cluster Distributed Incremental Update(BDTMCDIncreUpdate),which combines distributed computing,storage technology,and incremental update techniques to provide an efficient and effective means for clustering analysis.Firstly,the original dataset is divided into multiple subblocks,and distributed computing resources are utilized to process the sub-blocks in parallel,enhancing efficiency.Then,initial clustering is performed on each sub-block using tensor-based multi-clustering techniques to obtain preliminary results.When new data arrives,incremental update technology is employed to update the core tensor and factor matrix,ensuring that the clustering model can adapt to changes in data.Finally,by combining the updated core tensor and factor matrix with historical computational results,refined clustering results are obtained,achieving real-time adaptation to dynamic data.Through experimental simulation on the Aminer dataset,the BDTMCDIncreUpdate method has demonstrated outstanding performance in terms of accuracy(ACC)and normalized mutual information(NMI)metrics,achieving an accuracy rate of 90%and an NMI score of 0.85,which outperforms existing methods such as TClusInitUpdate and TKLClusUpdate in most scenarios.Therefore,the BDTMCDIncreUpdate method offers an innovative solution to the field of big data analysis,integrating distributed computing,incremental updates,and tensor-based multi-clustering techniques.It not only improves the efficiency and scalability in processing large-scale high-dimensional datasets but also has been validated for its effectiveness and accuracy through experiments.This method shows great potential in real-world applications where dynamic data growth is common,and it is of significant importance for advancing the development of data analysis technology.
文摘In this study, we delve into the realm of efficient Big Data Engineering and Extract, Transform, Load (ETL) processes within the healthcare sector, leveraging the robust foundation provided by the MIMIC-III Clinical Database. Our investigation entails a comprehensive exploration of various methodologies aimed at enhancing the efficiency of ETL processes, with a primary emphasis on optimizing time and resource utilization. Through meticulous experimentation utilizing a representative dataset, we shed light on the advantages associated with the incorporation of PySpark and Docker containerized applications. Our research illuminates significant advancements in time efficiency, process streamlining, and resource optimization attained through the utilization of PySpark for distributed computing within Big Data Engineering workflows. Additionally, we underscore the strategic integration of Docker containers, delineating their pivotal role in augmenting scalability and reproducibility within the ETL pipeline. This paper encapsulates the pivotal insights gleaned from our experimental journey, accentuating the practical implications and benefits entailed in the adoption of PySpark and Docker. By streamlining Big Data Engineering and ETL processes in the context of clinical big data, our study contributes to the ongoing discourse on optimizing data processing efficiency in healthcare applications. The source code is available on request.
基金supported by the National Key Research Program(No.2018YFB1601105,No.2018YFB1601102)the Natural Science Foundation of China(No.41975165,No.U1811463)Chongqing Science and Technology Project(No.cstc2019jscxfxydX0035)。
文摘High-resolution vehicular emissions inventories are important for managing vehicular pollution and improving urban air quality. This study developed a vehicular emission inventory with high spatio-temporal resolution in the main urban area of Chongqing, based on realtime traffic data from 820 RFID detectors covering 454 roads, and the differences in spatiotemporal emission characteristics between inner and outer districts were analysed. The result showed that the daily vehicular emission intensities of CO, hydrocarbons, PM2.5, PM10,and NO_(x) were 30.24, 3.83, 0.18, 0.20, and 8.65 kg/km per day, respectively, in the study area during 2018. The pollutants emission intensities in inner district were higher than those in outer district. Light passenger cars(LPCs) were the main contributors of all-day CO emissions in the inner and outer districts, from which the contributors of NO_(x) emissions were different. Diesel and natural gas buses were major contributors of daytime NO_(x) emissions in inner districts, accounting for 40.40%, but buses and heavy duty trucks(HDTs) were major contributors in outer districts. At nighttime, due to the lifting of truck restrictions and suspension of buses, HDTs become the main NO_(x) contributor in both inner and outer districts,and its three NO_(x) emission peak hours were found, which are different to the peak hours of total NO_(x) emission by all vehicles. Unlike most other cities, bridges and connecting channels are always emission hotspots due to long-time traffic congestion. This knowledge will help fully understand vehicular emissions characteristics and is useful for policymakers to design precise prevention and control measures.
基金The work was supported by the National Natural Science Foundation of China (NSFC) under Grant No. 61133004, the National High Technology Research and Development 863 Program of China under Grant No. 2012AA01A302, and the NSFC Projects of International Cooperation and Exchanges under Grant No. 61361126011.
文摘Nowadays, there exist numerous images in the Internet, and with the development ot cloud compuung ano big data applications, many of those images need to be processed for different kinds of applications by using specific image processing algorithms. Meanwhile, there already exist many kinds of image processing algorithms and their variations, while new algorithms are still emerging. Consequently, an ongoing problem is how to improve the efficiency of massive image processing and support the integration of existing implementations of image processing algorithms into the systems. This paper proposes a distributed image processing system named SEIP, which is built on Hadoop, and employs extensible in- node architecture to support various kinds of image processing algorithms on distributed platforms with GPU accelerators. The system also uses a pipeline-based h'amework to accelerate massive image file processing. A demonstration application for image feature extraction is designed. The system is evaluated in a small-scale Hadoop cluster with GPU accelerators, and the experimental results show the usability and efficiency of SEIP.
文摘Data processing is a basic and crucial factor in seismic exploration,which can influence the effect of subsequent processing directly. Thus the selection of appropriate method for data processing is one of the most important tasks throughout the work. By simulating,the authors analyze and compare Fractional Fourier Transform( FRFT) and Wigner-Ville distribution( WVD),then summarize the similarities and advantages and disadvantages of the two methods. The results reveal that FRFT is more effective and suitable for application in seismic exploration than WVD.