A novel method for noise removal from the rotating accelerometer gravity gradiometer(MAGG)is presented.It introduces a head-to-tail data expansion technique based on the zero-phase filtering principle.A scheme for det...A novel method for noise removal from the rotating accelerometer gravity gradiometer(MAGG)is presented.It introduces a head-to-tail data expansion technique based on the zero-phase filtering principle.A scheme for determining band-pass filter parameters based on signal-to-noise ratio gain,smoothness index,and cross-correlation coefficient is designed using the Chebyshev optimal consistent approximation theory.Additionally,a wavelet denoising evaluation function is constructed,with the dmey wavelet basis function identified as most effective for processing gravity gradient data.The results of hard-in-the-loop simulation and prototype experiments show that the proposed processing method has shown a 14%improvement in the measurement variance of gravity gradient signals,and the measurement accuracy has reached within 4E,compared to other commonly used methods,which verifies that the proposed method effectively removes noise from the gradient signals,improved gravity gradiometry accuracy,and has certain technical insights for high-precision airborne gravity gradiometry.展开更多
The convergence of Internet of Things(IoT),5G,and cloud collaboration offers tailored solutions to the rigorous demands of multi-flow integrated energy aggregation dispatch data processing.While generative adversarial...The convergence of Internet of Things(IoT),5G,and cloud collaboration offers tailored solutions to the rigorous demands of multi-flow integrated energy aggregation dispatch data processing.While generative adversarial networks(GANs)are instrumental in resource scheduling,their application in this domain is impeded by challenges such as convergence speed,inferior optimality searching capability,and the inability to learn from failed decision making feedbacks.Therefore,a cloud-edge collaborative federated GAN-based communication and computing resource scheduling algorithm with long-term constraint violation sensitiveness is proposed to address these challenges.The proposed algorithm facilitates real-time,energy-efficient data processing by optimizing transmission power control,data migration,and computing resource allocation.It employs federated learning for global parameter aggregation to enhance GAN parameter updating and dynamically adjusts GAN learning rates and global aggregation weights based on energy consumption constraint violations.Simulation results indicate that the proposed algorithm effectively reduces data processing latency,energy consumption,and convergence time.展开更多
Although a large number of studies have focused on various aspects of politeness,very little is known about how politeness intention is activated cognitively during verbal communication.The present study aims to explo...Although a large number of studies have focused on various aspects of politeness,very little is known about how politeness intention is activated cognitively during verbal communication.The present study aims to explore the cognitive mechanism of politeness intention processing,and how it is related to pragmatic failure during cross-cultural communication.Using 30 Chinese EFL university students who were instructed to finish a probe word judgment task with 96 virtual scenarios,the results indicate that within both mono-and cross-cultural contexts,the response time in the experimental scenarios was significantly slower than that of the filler scenarios.This suggests that politeness intention was activated while understanding the surface meaning of the conversation;however,the EFL learners could not completely avoid the negative transfer of their native politeness conventions when they were comprehending the conversational intention of the target language.Furthermore,no significant differences in response time were found between the groups with high and low English pragmatic competence,illustrating that transferring the pragmatic rules and principles into cross-cultural communication skills was more cognitively demanding.Overall,this study adds to the literature on politeness research and provides some implications for foreign language pragmatic instructions.展开更多
In this study, we delve into the realm of efficient Big Data Engineering and Extract, Transform, Load (ETL) processes within the healthcare sector, leveraging the robust foundation provided by the MIMIC-III Clinical D...In this study, we delve into the realm of efficient Big Data Engineering and Extract, Transform, Load (ETL) processes within the healthcare sector, leveraging the robust foundation provided by the MIMIC-III Clinical Database. Our investigation entails a comprehensive exploration of various methodologies aimed at enhancing the efficiency of ETL processes, with a primary emphasis on optimizing time and resource utilization. Through meticulous experimentation utilizing a representative dataset, we shed light on the advantages associated with the incorporation of PySpark and Docker containerized applications. Our research illuminates significant advancements in time efficiency, process streamlining, and resource optimization attained through the utilization of PySpark for distributed computing within Big Data Engineering workflows. Additionally, we underscore the strategic integration of Docker containers, delineating their pivotal role in augmenting scalability and reproducibility within the ETL pipeline. This paper encapsulates the pivotal insights gleaned from our experimental journey, accentuating the practical implications and benefits entailed in the adoption of PySpark and Docker. By streamlining Big Data Engineering and ETL processes in the context of clinical big data, our study contributes to the ongoing discourse on optimizing data processing efficiency in healthcare applications. The source code is available on request.展开更多
Chaotic optical communication has shown large potential as a hardware encryption method in the physical layer.As an important figure of merit,the bit rate–distance product of chaotic optical communication has been co...Chaotic optical communication has shown large potential as a hardware encryption method in the physical layer.As an important figure of merit,the bit rate–distance product of chaotic optical communication has been continually improved to 30 Gb/s×340 km,but it is still far from the requirement for a deployed optical fiber communication system,which is beyond 100 Gb/s×1000 km.A chaotic carrier can be considered as an analog signal and suffers from fiber channel impairments,limiting the transmission distance of high-speed chaotic optical communications.To break the limit,we propose and experimentally demonstrate a pilot-based digital signal processing scheme for coherent chaotic optical communication combined with deep-learning-based chaotic synchronization.Both transmission impairment recovery and chaotic synchronization are realized in the digital domain.The frequency offset of the lasers is accurately estimated and compensated by determining the location of the pilot tone in the frequency domain,and the equalization and phase noise compensation are jointly performed by the least mean square algorithm through the time domain pilot symbols.Using the proposed method,100 Gb∕s chaotically encrypted quadrature phase-shift keying(QPSK)signal over 800 km single-mode fiber(SMF)transmission is experimentally demonstrated.In order to enhance security,40 Gb∕s real-time chaotically encrypted QPSK signal over 800 km SMF transmission is realized by inserting pilot symbols and tone in a field-programmable gate array.This method provides a feasible approach to promote the practical application of chaotic optical communications and guarantees the high security of chaotic encryption.展开更多
With the continued development of multiple Global Navigation Satellite Systems(GNSS)and the emergence of various frequencies,UnDifferenced and UnCombined(UDUC)data processing has become an increasingly attractive opti...With the continued development of multiple Global Navigation Satellite Systems(GNSS)and the emergence of various frequencies,UnDifferenced and UnCombined(UDUC)data processing has become an increasingly attractive option.In this contribution,we provide an overview of the current status of UDUC GNSS data processing activities in China.These activities encompass the formulation of Precise Point Positioning(PPP)models and PPP-Real-Time Kinematic(PPP-RTK)models for processing single-station and multi-station GNSS data,respectively.Regarding single-station data processing,we discuss the advancements in PPP models,particularly the extension from a single system to multiple systems,and from dual frequencies to single and multiple frequencies.Additionally,we introduce the modified PPP model,which accounts for the time variation of receiver code biases,a departure from the conventional PPP model that typically assumes these biases to be time-constant.In the realm of multi-station PPP-RTK data processing,we introduce the ionosphere-weighted PPP-RTK model,which enhances the model strength by considering the spatial correlation of ionospheric delays.We also review the phase-only PPP-RTK model,designed to mitigate the impact of unmodelled code-related errors.Furthermore,we explore GLONASS PPP-RTK,achieved through the application of the integer-estimable model.For large-scale network data processing,we introduce the all-in-view PPP-RTK model,which alleviates the strict common-view requirement at all receivers.Moreover,we present the decentralized PPP-RTK data processing strategy,designed to improve computational efficiency.Overall,this work highlights the various advancements in UDUC GNSS data processing,providing insights into the state-of-the-art techniques employed in China to achieve precise GNSS applications.展开更多
Data processing of small samples is an important and valuable research problem in the electronic equipment test. Because it is difficult and complex to determine the probability distribution of small samples, it is di...Data processing of small samples is an important and valuable research problem in the electronic equipment test. Because it is difficult and complex to determine the probability distribution of small samples, it is difficult to use the traditional probability theory to process the samples and assess the degree of uncertainty. Using the grey relational theory and the norm theory, the grey distance information approach, which is based on the grey distance information quantity of a sample and the average grey distance information quantity of the samples, is proposed in this article. The definitions of the grey distance information quantity of a sample and the average grey distance information quantity of the samples, with their characteristics and algorithms, are introduced. The correlative problems, including the algorithm of estimated value, the standard deviation, and the acceptance and rejection criteria of the samples and estimated results, are also proposed. Moreover, the information whitening ratio is introduced to select the weight algorithm and to compare the different samples. Several examples are given to demonstrate the application of the proposed approach. The examples show that the proposed approach, which has no demand for the probability distribution of small samples, is feasible and effective.展开更多
The High Precision Magnetometer(HPM) on board the China Seismo-Electromagnetic Satellite(CSES) allows highly accurate measurement of the geomagnetic field; it includes FGM(Fluxgate Magnetometer) and CDSM(Coupled Dark ...The High Precision Magnetometer(HPM) on board the China Seismo-Electromagnetic Satellite(CSES) allows highly accurate measurement of the geomagnetic field; it includes FGM(Fluxgate Magnetometer) and CDSM(Coupled Dark State Magnetometer)probes. This article introduces the main processing method, algorithm, and processing procedure of the HPM data. First, the FGM and CDSM probes are calibrated according to ground sensor data. Then the FGM linear parameters can be corrected in orbit, by applying the absolute vector magnetic field correction algorithm from CDSM data. At the same time, the magnetic interference of the satellite is eliminated according to ground-satellite magnetic test results. Finally, according to the characteristics of the magnetic field direction in the low latitude region, the transformation matrix between FGM probe and star sensor is calibrated in orbit to determine the correct direction of the magnetic field. Comparing the magnetic field data of CSES and SWARM satellites in five continuous geomagnetic quiet days, the difference in measurements of the vector magnetic field is about 10 nT, which is within the uncertainty interval of geomagnetic disturbance.展开更多
The data processing mode is vital to the performance of an entire coalmine gas early-warning system, especially in real-time performance. Our objective was to present the structural features of coalmine gas data, so t...The data processing mode is vital to the performance of an entire coalmine gas early-warning system, especially in real-time performance. Our objective was to present the structural features of coalmine gas data, so that the data could be processed at different priority levels in C language. Two different data processing models, one with priority and the other without priority, were built based on queuing theory. Their theoretical formulas were determined via a M/M/I model in order to calculate average occupation time of each measuring point in an early-warning program. We validated the model with the gas early-warning system of the Huaibei Coalmine Group Corp. The results indicate that the average occupation time for gas data processing by using the queuing system model with priority is nearly 1/30 of that of the model without priority.展开更多
A novel technique for automatic seismic data processing using both integral and local feature of seismograms was presented in this paper. Here, the term integral feature of seismograms refers to feature which may depi...A novel technique for automatic seismic data processing using both integral and local feature of seismograms was presented in this paper. Here, the term integral feature of seismograms refers to feature which may depict the shape of the whole seismograms. However, unlike some previous efforts which completely abandon the DIAL approach, i.e., signal detection, phase identifi- cation, association, and event localization, and seek to use envelope cross-correlation to detect seismic events directly, our technique keeps following the DIAL approach, but in addition to detect signals corresponding to individual seismic phases, it also detects continuous wave-trains and explores their feature for phase-type identification and signal association. More concrete ideas about how to define wave-trains and combine them with various detections, as well as how to measure and utilize their feature in the seismic data processing were expatiated in the paper. This approach has been applied to the routine data processing by us for years, and test results for a 16 days' period using data from the Xinjiang seismic station network were presented. The automatic processing results have fairly low false and missed event rate simultaneously, showing that the new technique has good application prospects for improvement of the automatic seismic data processing.展开更多
How to design a multicast key management system with high performance is a hot issue now. This paper will apply the idea of hierarchical data processing to construct a common analytic model based on directed logical k...How to design a multicast key management system with high performance is a hot issue now. This paper will apply the idea of hierarchical data processing to construct a common analytic model based on directed logical key tree and supply two important metrics to this problem: re-keying cost and key storage cost. The paper gives the basic theory to the hierarchical data processing and the analyzing model to multieast key management based on logical key tree. It has been proved that the 4-ray tree has the best performance in using these metrics. The key management problem is also investigated based on user probability model, and gives two evaluating parameters to re-keying and key storage cost.展开更多
Low-field(nuclear magnetic resonance)NMR has been widely used in petroleum industry,such as well logging and laboratory rock core analysis.However,the signal-to-noise ratio is low due to the low magnetic field strengt...Low-field(nuclear magnetic resonance)NMR has been widely used in petroleum industry,such as well logging and laboratory rock core analysis.However,the signal-to-noise ratio is low due to the low magnetic field strength of NMR tools and the complex petrophysical properties of detected samples.Suppressing the noise and highlighting the available NMR signals is very important for subsequent data processing.Most denoising methods are normally based on fixed mathematical transformation or handdesign feature selectors to suppress noise characteristics,which may not perform well because of their non-adaptive performance to different noisy signals.In this paper,we proposed a“data processing framework”to improve the quality of low field NMR echo data based on dictionary learning.Dictionary learning is a machine learning method based on redundancy and sparse representation theory.Available information in noisy NMR echo data can be adaptively extracted and reconstructed by dictionary learning.The advantages and application effectiveness of the proposed method were verified with a number of numerical simulations,NMR core data analyses,and NMR logging data processing.The results show that dictionary learning can significantly improve the quality of NMR echo data with high noise level and effectively improve the accuracy and reliability of inversion results.展开更多
In the course of network supported collaborative design, the data processing plays a very vital role. Much effort has been spent in this area, and many kinds of approaches have been proposed. Based on the correlative ...In the course of network supported collaborative design, the data processing plays a very vital role. Much effort has been spent in this area, and many kinds of approaches have been proposed. Based on the correlative materials, this paper presents extensible markup language (XML) based strategy for several important problems of data processing in network supported collaborative design, such as the representation of standard for the exchange of product model data (STEP) with XML in the product information expression and the management of XML documents using relational database. The paper gives a detailed exposition on how to clarify the mapping between XML structure and the relationship database structure and how XML-QL queries can be translated into structured query language (SQL) queries. Finally, the structure of data processing system based on XML is presented.展开更多
In consultative committee for space data systems(CCSDS) file delivery protocol(CFDP) recommendation of reliable transmission,there are no detail transmission procedure and delay calculation of prompted negative ac...In consultative committee for space data systems(CCSDS) file delivery protocol(CFDP) recommendation of reliable transmission,there are no detail transmission procedure and delay calculation of prompted negative acknowledge and asynchronous negative acknowledge models.CFDP is designed to provide data and storage management,story and forward,custody transfer and reliable end-to-end delivery over deep space characterized by huge latency,intermittent link,asymmetric bandwidth and big bit error rate(BER).Four reliable transmission models are analyzed and an expected file-delivery time is calculated with different trans-mission rates,numbers and sizes of packet data units,BERs and frequencies of external events,etc.By comparison of four CFDP models,the requirement of BER for typical missions in deep space is obtained and rules of choosing CFDP models under different uplink state informations are given,which provides references for protocol models selection,utilization and modification.展开更多
With the rapid development of information technology,5G communication technology has gradually entered real life,among which the application of edge computing is particularly significant in the information and communi...With the rapid development of information technology,5G communication technology has gradually entered real life,among which the application of edge computing is particularly significant in the information and communication system field.This paper focuses on using edge computing based on 5G communication in information and communication systems.First,the study analyzes the importance of combining edge computing technology with 5G communication technology,and its advantages,such as high efficiency and low latency in processing large amounts of data.The study then explores multiple application scenarios of edge computing in information and communication systems,such as integrated use in the Internet of Things,intelligent transportation,telemedicine and Industry 4.0.The research method is mainly based on theoretical analysis and experimental verification,combined with the characteristics of the 5G network to optimize the edge computing model and test the performance of edge computing in different scenarios through experimental simulation.The results show that edge computing significantly improves the data processing capacity and response speed of ICS in a 5G environment.However,there are also a series of challenges in practical application,including data security and privacy protection,the complexity of resource management and allocation,and the guarantee of quality of service(QoS).Through the case analysis and problem analysis,the paper puts forward the corresponding solution strategies,such as strengthening the data security protocol,introducing the intelligent resource scheduling system and establishing a multi-dimensional service quality monitoring mechanism.Finally,this study points out that the deep integration of edge computing and 5G communication will continue to promote the innovative development of information and communication systems,which has a far-reaching impact and important practical significance for promoting the transformation and upgrading in the field of information technology.展开更多
Due to the limited scenes that synthetic aperture radar(SAR)satellites can detect,the full-track utilization rate is not high.Because of the computing and storage limitation of one satellite,it is difficult to process...Due to the limited scenes that synthetic aperture radar(SAR)satellites can detect,the full-track utilization rate is not high.Because of the computing and storage limitation of one satellite,it is difficult to process large amounts of data of spaceborne synthetic aperture radars.It is proposed to use a new method of networked satellite data processing for improving the efficiency of data processing.A multi-satellite distributed SAR real-time processing method based on Chirp Scaling(CS)imaging algorithm is studied in this paper,and a distributed data processing system is built with field programmable gate array(FPGA)chips as the kernel.Different from the traditional CS algorithm processing,the system divides data processing into three stages.The computing tasks are reasonably allocated to different data processing units(i.e.,satellites)in each stage.The method effectively saves computing and storage resources of satellites,improves the utilization rate of a single satellite,and shortens the data processing time.Gaofen-3(GF-3)satellite SAR raw data is processed by the system,with the performance of the method verified.展开更多
One of the most important project missions of neutral beam injectors is the implementation of 100 s neutral beam injection (NBI) with high power energy t.o the plasma of the EAST superconducting tokamak. Correspondi...One of the most important project missions of neutral beam injectors is the implementation of 100 s neutral beam injection (NBI) with high power energy t.o the plasma of the EAST superconducting tokamak. Correspondingly, it's necessary to construct a high-speed and reliable computer data processing system for processing experimental data, such as data acquisition, data compression and storage, data decompression and query, as well as data analysis. The implementation of computer data processing application software (CDPS) for EAST NBI is presented in this paper in terms of its functional structure and system realization. The set of software is programmed in C language and runs on Linux operating system based on TCP network protocol and multi-threading technology. The hardware mainly includes industrial control computer (IPC), data server, PXI DAQ cards and so on. Now this software has been applied to EAST NBI system, and experimental results show that the CDPS can serve EAST NBI very well.展开更多
In this paper,we propose a novel fuzzy matching data sharing scheme named FADS for cloudedge communications.FADS allows users to specify their access policies,and enables receivers to obtain the data transmitted by th...In this paper,we propose a novel fuzzy matching data sharing scheme named FADS for cloudedge communications.FADS allows users to specify their access policies,and enables receivers to obtain the data transmitted by the senders if and only if the two sides meet their defined certain policies simultaneously.Specifically,we first formalize the definition and security models of fuzzy matching data sharing in cloud-edge environments.Then,we construct a concrete instantiation by pairing-based cryptosystem and the privacy-preserving set intersection on attribute sets from both sides to construct a concurrent matching over the policies.If the matching succeeds,the data can be decrypted.Otherwise,nothing will be revealed.In addition,FADS allows users to dynamically specify the policy for each time,which is an urgent demand in practice.A thorough security analysis demonstrates that FADS is of provable security under indistinguishable chosen ciphertext attack(IND-CCA)in random oracle model against probabilistic polynomial-time(PPT)adversary,and the desirable security properties of privacy and authenticity are achieved.Extensive experiments provide evidence that FADS is with acceptable efficiency.展开更多
Due to the increasing number of cloud applications,the amount of data in the cloud shows signs of growing faster than ever before.The nature of cloud computing requires cloud data processing systems that can handle hu...Due to the increasing number of cloud applications,the amount of data in the cloud shows signs of growing faster than ever before.The nature of cloud computing requires cloud data processing systems that can handle huge volumes of data and have high performance.However,most cloud storage systems currently adopt a hash-like approach to retrieving data that only supports simple keyword-based enquiries,but lacks various forms of information search.Therefore,a scalable and efficient indexing scheme is clearly required.In this paper,we present a skip list-based cloud index,called SLC-index,which is a novel,scalable skip list-based indexing for cloud data processing.The SLC-index offers a two-layered architecture for extending indexing scope and facilitating better throughput.Dynamic load-balancing for the SLC-index is achieved by online migration of index nodes between servers.Furthermore,it is a flexible system due to its dynamic addition and removal of servers.The SLC-index is efficient for both point and range queries.Experimental results show the efficiency of the SLC-index and its usefulness as an alternative approach for cloud-suitable data structures.展开更多
In comparison with the ITRF2000 model, the ITRF2005 model represents a significant improvement in solution generation, datum definition and realization. However, these improvements cause a frame difference between the...In comparison with the ITRF2000 model, the ITRF2005 model represents a significant improvement in solution generation, datum definition and realization. However, these improvements cause a frame difference between the ITRF2000 and ITRF2005 models, which may impact GNSS data processing. To quantify this im- pact, the differences of the GNSS results obtained using the two models, including station coordinates, base- line length and horizontal velocity field, were analyzed. After transformation, the differences in position were at the millimeter level, and the differences in baseline length were less than 1 ram. The differences in the hori- zontal velocity fields decreased with as the study area was reduced. For a large region, the differences in these value were less than 1 mm/a, with a systematic difference of approximately 2 degrees in direction, while for a medium-sized region, the differences in value and direction were not significant.展开更多
文摘A novel method for noise removal from the rotating accelerometer gravity gradiometer(MAGG)is presented.It introduces a head-to-tail data expansion technique based on the zero-phase filtering principle.A scheme for determining band-pass filter parameters based on signal-to-noise ratio gain,smoothness index,and cross-correlation coefficient is designed using the Chebyshev optimal consistent approximation theory.Additionally,a wavelet denoising evaluation function is constructed,with the dmey wavelet basis function identified as most effective for processing gravity gradient data.The results of hard-in-the-loop simulation and prototype experiments show that the proposed processing method has shown a 14%improvement in the measurement variance of gravity gradient signals,and the measurement accuracy has reached within 4E,compared to other commonly used methods,which verifies that the proposed method effectively removes noise from the gradient signals,improved gravity gradiometry accuracy,and has certain technical insights for high-precision airborne gravity gradiometry.
基金supported by China Southern Power Grid Technology Project under Grant 03600KK52220019(GDKJXM20220253).
文摘The convergence of Internet of Things(IoT),5G,and cloud collaboration offers tailored solutions to the rigorous demands of multi-flow integrated energy aggregation dispatch data processing.While generative adversarial networks(GANs)are instrumental in resource scheduling,their application in this domain is impeded by challenges such as convergence speed,inferior optimality searching capability,and the inability to learn from failed decision making feedbacks.Therefore,a cloud-edge collaborative federated GAN-based communication and computing resource scheduling algorithm with long-term constraint violation sensitiveness is proposed to address these challenges.The proposed algorithm facilitates real-time,energy-efficient data processing by optimizing transmission power control,data migration,and computing resource allocation.It employs federated learning for global parameter aggregation to enhance GAN parameter updating and dynamically adjusts GAN learning rates and global aggregation weights based on energy consumption constraint violations.Simulation results indicate that the proposed algorithm effectively reduces data processing latency,energy consumption,and convergence time.
文摘Although a large number of studies have focused on various aspects of politeness,very little is known about how politeness intention is activated cognitively during verbal communication.The present study aims to explore the cognitive mechanism of politeness intention processing,and how it is related to pragmatic failure during cross-cultural communication.Using 30 Chinese EFL university students who were instructed to finish a probe word judgment task with 96 virtual scenarios,the results indicate that within both mono-and cross-cultural contexts,the response time in the experimental scenarios was significantly slower than that of the filler scenarios.This suggests that politeness intention was activated while understanding the surface meaning of the conversation;however,the EFL learners could not completely avoid the negative transfer of their native politeness conventions when they were comprehending the conversational intention of the target language.Furthermore,no significant differences in response time were found between the groups with high and low English pragmatic competence,illustrating that transferring the pragmatic rules and principles into cross-cultural communication skills was more cognitively demanding.Overall,this study adds to the literature on politeness research and provides some implications for foreign language pragmatic instructions.
文摘In this study, we delve into the realm of efficient Big Data Engineering and Extract, Transform, Load (ETL) processes within the healthcare sector, leveraging the robust foundation provided by the MIMIC-III Clinical Database. Our investigation entails a comprehensive exploration of various methodologies aimed at enhancing the efficiency of ETL processes, with a primary emphasis on optimizing time and resource utilization. Through meticulous experimentation utilizing a representative dataset, we shed light on the advantages associated with the incorporation of PySpark and Docker containerized applications. Our research illuminates significant advancements in time efficiency, process streamlining, and resource optimization attained through the utilization of PySpark for distributed computing within Big Data Engineering workflows. Additionally, we underscore the strategic integration of Docker containers, delineating their pivotal role in augmenting scalability and reproducibility within the ETL pipeline. This paper encapsulates the pivotal insights gleaned from our experimental journey, accentuating the practical implications and benefits entailed in the adoption of PySpark and Docker. By streamlining Big Data Engineering and ETL processes in the context of clinical big data, our study contributes to the ongoing discourse on optimizing data processing efficiency in healthcare applications. The source code is available on request.
基金supported by the National Nature Science Foundation of China (Grant No.62025503).
文摘Chaotic optical communication has shown large potential as a hardware encryption method in the physical layer.As an important figure of merit,the bit rate–distance product of chaotic optical communication has been continually improved to 30 Gb/s×340 km,but it is still far from the requirement for a deployed optical fiber communication system,which is beyond 100 Gb/s×1000 km.A chaotic carrier can be considered as an analog signal and suffers from fiber channel impairments,limiting the transmission distance of high-speed chaotic optical communications.To break the limit,we propose and experimentally demonstrate a pilot-based digital signal processing scheme for coherent chaotic optical communication combined with deep-learning-based chaotic synchronization.Both transmission impairment recovery and chaotic synchronization are realized in the digital domain.The frequency offset of the lasers is accurately estimated and compensated by determining the location of the pilot tone in the frequency domain,and the equalization and phase noise compensation are jointly performed by the least mean square algorithm through the time domain pilot symbols.Using the proposed method,100 Gb∕s chaotically encrypted quadrature phase-shift keying(QPSK)signal over 800 km single-mode fiber(SMF)transmission is experimentally demonstrated.In order to enhance security,40 Gb∕s real-time chaotically encrypted QPSK signal over 800 km SMF transmission is realized by inserting pilot symbols and tone in a field-programmable gate array.This method provides a feasible approach to promote the practical application of chaotic optical communications and guarantees the high security of chaotic encryption.
基金National Natural Science Foundation of China(No.42022025)。
文摘With the continued development of multiple Global Navigation Satellite Systems(GNSS)and the emergence of various frequencies,UnDifferenced and UnCombined(UDUC)data processing has become an increasingly attractive option.In this contribution,we provide an overview of the current status of UDUC GNSS data processing activities in China.These activities encompass the formulation of Precise Point Positioning(PPP)models and PPP-Real-Time Kinematic(PPP-RTK)models for processing single-station and multi-station GNSS data,respectively.Regarding single-station data processing,we discuss the advancements in PPP models,particularly the extension from a single system to multiple systems,and from dual frequencies to single and multiple frequencies.Additionally,we introduce the modified PPP model,which accounts for the time variation of receiver code biases,a departure from the conventional PPP model that typically assumes these biases to be time-constant.In the realm of multi-station PPP-RTK data processing,we introduce the ionosphere-weighted PPP-RTK model,which enhances the model strength by considering the spatial correlation of ionospheric delays.We also review the phase-only PPP-RTK model,designed to mitigate the impact of unmodelled code-related errors.Furthermore,we explore GLONASS PPP-RTK,achieved through the application of the integer-estimable model.For large-scale network data processing,we introduce the all-in-view PPP-RTK model,which alleviates the strict common-view requirement at all receivers.Moreover,we present the decentralized PPP-RTK data processing strategy,designed to improve computational efficiency.Overall,this work highlights the various advancements in UDUC GNSS data processing,providing insights into the state-of-the-art techniques employed in China to achieve precise GNSS applications.
文摘Data processing of small samples is an important and valuable research problem in the electronic equipment test. Because it is difficult and complex to determine the probability distribution of small samples, it is difficult to use the traditional probability theory to process the samples and assess the degree of uncertainty. Using the grey relational theory and the norm theory, the grey distance information approach, which is based on the grey distance information quantity of a sample and the average grey distance information quantity of the samples, is proposed in this article. The definitions of the grey distance information quantity of a sample and the average grey distance information quantity of the samples, with their characteristics and algorithms, are introduced. The correlative problems, including the algorithm of estimated value, the standard deviation, and the acceptance and rejection criteria of the samples and estimated results, are also proposed. Moreover, the information whitening ratio is introduced to select the weight algorithm and to compare the different samples. Several examples are given to demonstrate the application of the proposed approach. The examples show that the proposed approach, which has no demand for the probability distribution of small samples, is feasible and effective.
基金supported by National Key Research and Development Program of China from MOST (2016YFB0501503)
文摘The High Precision Magnetometer(HPM) on board the China Seismo-Electromagnetic Satellite(CSES) allows highly accurate measurement of the geomagnetic field; it includes FGM(Fluxgate Magnetometer) and CDSM(Coupled Dark State Magnetometer)probes. This article introduces the main processing method, algorithm, and processing procedure of the HPM data. First, the FGM and CDSM probes are calibrated according to ground sensor data. Then the FGM linear parameters can be corrected in orbit, by applying the absolute vector magnetic field correction algorithm from CDSM data. At the same time, the magnetic interference of the satellite is eliminated according to ground-satellite magnetic test results. Finally, according to the characteristics of the magnetic field direction in the low latitude region, the transformation matrix between FGM probe and star sensor is calibrated in orbit to determine the correct direction of the magnetic field. Comparing the magnetic field data of CSES and SWARM satellites in five continuous geomagnetic quiet days, the difference in measurements of the vector magnetic field is about 10 nT, which is within the uncertainty interval of geomagnetic disturbance.
基金Project 70533050 supported by the National Natural Science Foundation of China
文摘The data processing mode is vital to the performance of an entire coalmine gas early-warning system, especially in real-time performance. Our objective was to present the structural features of coalmine gas data, so that the data could be processed at different priority levels in C language. Two different data processing models, one with priority and the other without priority, were built based on queuing theory. Their theoretical formulas were determined via a M/M/I model in order to calculate average occupation time of each measuring point in an early-warning program. We validated the model with the gas early-warning system of the Huaibei Coalmine Group Corp. The results indicate that the average occupation time for gas data processing by using the queuing system model with priority is nearly 1/30 of that of the model without priority.
文摘A novel technique for automatic seismic data processing using both integral and local feature of seismograms was presented in this paper. Here, the term integral feature of seismograms refers to feature which may depict the shape of the whole seismograms. However, unlike some previous efforts which completely abandon the DIAL approach, i.e., signal detection, phase identifi- cation, association, and event localization, and seek to use envelope cross-correlation to detect seismic events directly, our technique keeps following the DIAL approach, but in addition to detect signals corresponding to individual seismic phases, it also detects continuous wave-trains and explores their feature for phase-type identification and signal association. More concrete ideas about how to define wave-trains and combine them with various detections, as well as how to measure and utilize their feature in the seismic data processing were expatiated in the paper. This approach has been applied to the routine data processing by us for years, and test results for a 16 days' period using data from the Xinjiang seismic station network were presented. The automatic processing results have fairly low false and missed event rate simultaneously, showing that the new technique has good application prospects for improvement of the automatic seismic data processing.
基金Supported by the National High-Technology Re-search and Development Programof China(2001AA115300) the Na-tional Natural Science Foundation of China (69874038) ,the Nat-ural Science Foundation of Liaoning Province(20031018)
文摘How to design a multicast key management system with high performance is a hot issue now. This paper will apply the idea of hierarchical data processing to construct a common analytic model based on directed logical key tree and supply two important metrics to this problem: re-keying cost and key storage cost. The paper gives the basic theory to the hierarchical data processing and the analyzing model to multieast key management based on logical key tree. It has been proved that the 4-ray tree has the best performance in using these metrics. The key management problem is also investigated based on user probability model, and gives two evaluating parameters to re-keying and key storage cost.
基金supported by Science Foundation of China University of Petroleum,Beijing(Grant Number ZX20210024)Chinese Postdoctoral Science Foundation(Grant Number 2021M700172)+1 种基金The Strategic Cooperation Technology Projects of CNPC and CUP(Grant Number ZLZX2020-03)National Natural Science Foundation of China(Grant Number 42004105)
文摘Low-field(nuclear magnetic resonance)NMR has been widely used in petroleum industry,such as well logging and laboratory rock core analysis.However,the signal-to-noise ratio is low due to the low magnetic field strength of NMR tools and the complex petrophysical properties of detected samples.Suppressing the noise and highlighting the available NMR signals is very important for subsequent data processing.Most denoising methods are normally based on fixed mathematical transformation or handdesign feature selectors to suppress noise characteristics,which may not perform well because of their non-adaptive performance to different noisy signals.In this paper,we proposed a“data processing framework”to improve the quality of low field NMR echo data based on dictionary learning.Dictionary learning is a machine learning method based on redundancy and sparse representation theory.Available information in noisy NMR echo data can be adaptively extracted and reconstructed by dictionary learning.The advantages and application effectiveness of the proposed method were verified with a number of numerical simulations,NMR core data analyses,and NMR logging data processing.The results show that dictionary learning can significantly improve the quality of NMR echo data with high noise level and effectively improve the accuracy and reliability of inversion results.
基金supported by National High Technology Research and Development Program of China (863 Program) (No. AA420060)
文摘In the course of network supported collaborative design, the data processing plays a very vital role. Much effort has been spent in this area, and many kinds of approaches have been proposed. Based on the correlative materials, this paper presents extensible markup language (XML) based strategy for several important problems of data processing in network supported collaborative design, such as the representation of standard for the exchange of product model data (STEP) with XML in the product information expression and the management of XML documents using relational database. The paper gives a detailed exposition on how to clarify the mapping between XML structure and the relationship database structure and how XML-QL queries can be translated into structured query language (SQL) queries. Finally, the structure of data processing system based on XML is presented.
基金supported by the National Natural Science Fandation of China (6067208960772075)
文摘In consultative committee for space data systems(CCSDS) file delivery protocol(CFDP) recommendation of reliable transmission,there are no detail transmission procedure and delay calculation of prompted negative acknowledge and asynchronous negative acknowledge models.CFDP is designed to provide data and storage management,story and forward,custody transfer and reliable end-to-end delivery over deep space characterized by huge latency,intermittent link,asymmetric bandwidth and big bit error rate(BER).Four reliable transmission models are analyzed and an expected file-delivery time is calculated with different trans-mission rates,numbers and sizes of packet data units,BERs and frequencies of external events,etc.By comparison of four CFDP models,the requirement of BER for typical missions in deep space is obtained and rules of choosing CFDP models under different uplink state informations are given,which provides references for protocol models selection,utilization and modification.
文摘With the rapid development of information technology,5G communication technology has gradually entered real life,among which the application of edge computing is particularly significant in the information and communication system field.This paper focuses on using edge computing based on 5G communication in information and communication systems.First,the study analyzes the importance of combining edge computing technology with 5G communication technology,and its advantages,such as high efficiency and low latency in processing large amounts of data.The study then explores multiple application scenarios of edge computing in information and communication systems,such as integrated use in the Internet of Things,intelligent transportation,telemedicine and Industry 4.0.The research method is mainly based on theoretical analysis and experimental verification,combined with the characteristics of the 5G network to optimize the edge computing model and test the performance of edge computing in different scenarios through experimental simulation.The results show that edge computing significantly improves the data processing capacity and response speed of ICS in a 5G environment.However,there are also a series of challenges in practical application,including data security and privacy protection,the complexity of resource management and allocation,and the guarantee of quality of service(QoS).Through the case analysis and problem analysis,the paper puts forward the corresponding solution strategies,such as strengthening the data security protocol,introducing the intelligent resource scheduling system and establishing a multi-dimensional service quality monitoring mechanism.Finally,this study points out that the deep integration of edge computing and 5G communication will continue to promote the innovative development of information and communication systems,which has a far-reaching impact and important practical significance for promoting the transformation and upgrading in the field of information technology.
基金Project(2017YFC1405600)supported by the National Key R&D Program of ChinaProject(18JK05032)supported by the Scientific Research Project of Education Department of Shaanxi Province,China。
文摘Due to the limited scenes that synthetic aperture radar(SAR)satellites can detect,the full-track utilization rate is not high.Because of the computing and storage limitation of one satellite,it is difficult to process large amounts of data of spaceborne synthetic aperture radars.It is proposed to use a new method of networked satellite data processing for improving the efficiency of data processing.A multi-satellite distributed SAR real-time processing method based on Chirp Scaling(CS)imaging algorithm is studied in this paper,and a distributed data processing system is built with field programmable gate array(FPGA)chips as the kernel.Different from the traditional CS algorithm processing,the system divides data processing into three stages.The computing tasks are reasonably allocated to different data processing units(i.e.,satellites)in each stage.The method effectively saves computing and storage resources of satellites,improves the utilization rate of a single satellite,and shortens the data processing time.Gaofen-3(GF-3)satellite SAR raw data is processed by the system,with the performance of the method verified.
基金supported by National Natural Science Foundation of China(No.11075183)
文摘One of the most important project missions of neutral beam injectors is the implementation of 100 s neutral beam injection (NBI) with high power energy t.o the plasma of the EAST superconducting tokamak. Correspondingly, it's necessary to construct a high-speed and reliable computer data processing system for processing experimental data, such as data acquisition, data compression and storage, data decompression and query, as well as data analysis. The implementation of computer data processing application software (CDPS) for EAST NBI is presented in this paper in terms of its functional structure and system realization. The set of software is programmed in C language and runs on Linux operating system based on TCP network protocol and multi-threading technology. The hardware mainly includes industrial control computer (IPC), data server, PXI DAQ cards and so on. Now this software has been applied to EAST NBI system, and experimental results show that the CDPS can serve EAST NBI very well.
基金supported by the China Postdoctoral Science Foundation (Grant Nos. 2021TQ0042, 2021M700435, 2021TQ0041)the National Natural Science Foundation of China (Grant No. 62102027)the Shandong Provincial Key Research and Development Program (2021CXGC010106)
文摘In this paper,we propose a novel fuzzy matching data sharing scheme named FADS for cloudedge communications.FADS allows users to specify their access policies,and enables receivers to obtain the data transmitted by the senders if and only if the two sides meet their defined certain policies simultaneously.Specifically,we first formalize the definition and security models of fuzzy matching data sharing in cloud-edge environments.Then,we construct a concrete instantiation by pairing-based cryptosystem and the privacy-preserving set intersection on attribute sets from both sides to construct a concurrent matching over the policies.If the matching succeeds,the data can be decrypted.Otherwise,nothing will be revealed.In addition,FADS allows users to dynamically specify the policy for each time,which is an urgent demand in practice.A thorough security analysis demonstrates that FADS is of provable security under indistinguishable chosen ciphertext attack(IND-CCA)in random oracle model against probabilistic polynomial-time(PPT)adversary,and the desirable security properties of privacy and authenticity are achieved.Extensive experiments provide evidence that FADS is with acceptable efficiency.
基金Projects(61363021,61540061,61663047)supported by the National Natural Science Foundation of ChinaProject(2017SE206)supported by the Open Foundation of Key Laboratory in Software Engineering of Yunnan Province,China
文摘Due to the increasing number of cloud applications,the amount of data in the cloud shows signs of growing faster than ever before.The nature of cloud computing requires cloud data processing systems that can handle huge volumes of data and have high performance.However,most cloud storage systems currently adopt a hash-like approach to retrieving data that only supports simple keyword-based enquiries,but lacks various forms of information search.Therefore,a scalable and efficient indexing scheme is clearly required.In this paper,we present a skip list-based cloud index,called SLC-index,which is a novel,scalable skip list-based indexing for cloud data processing.The SLC-index offers a two-layered architecture for extending indexing scope and facilitating better throughput.Dynamic load-balancing for the SLC-index is achieved by online migration of index nodes between servers.Furthermore,it is a flexible system due to its dynamic addition and removal of servers.The SLC-index is efficient for both point and range queries.Experimental results show the efficiency of the SLC-index and its usefulness as an alternative approach for cloud-suitable data structures.
基金supported by the Special Earthquake Research Project Granted by the China Earthquake Administration(201308009)
文摘In comparison with the ITRF2000 model, the ITRF2005 model represents a significant improvement in solution generation, datum definition and realization. However, these improvements cause a frame difference between the ITRF2000 and ITRF2005 models, which may impact GNSS data processing. To quantify this im- pact, the differences of the GNSS results obtained using the two models, including station coordinates, base- line length and horizontal velocity field, were analyzed. After transformation, the differences in position were at the millimeter level, and the differences in baseline length were less than 1 ram. The differences in the hori- zontal velocity fields decreased with as the study area was reduced. For a large region, the differences in these value were less than 1 mm/a, with a systematic difference of approximately 2 degrees in direction, while for a medium-sized region, the differences in value and direction were not significant.