Data processing of small samples is an important and valuable research problem in the electronic equipment test. Because it is difficult and complex to determine the probability distribution of small samples, it is di...Data processing of small samples is an important and valuable research problem in the electronic equipment test. Because it is difficult and complex to determine the probability distribution of small samples, it is difficult to use the traditional probability theory to process the samples and assess the degree of uncertainty. Using the grey relational theory and the norm theory, the grey distance information approach, which is based on the grey distance information quantity of a sample and the average grey distance information quantity of the samples, is proposed in this article. The definitions of the grey distance information quantity of a sample and the average grey distance information quantity of the samples, with their characteristics and algorithms, are introduced. The correlative problems, including the algorithm of estimated value, the standard deviation, and the acceptance and rejection criteria of the samples and estimated results, are also proposed. Moreover, the information whitening ratio is introduced to select the weight algorithm and to compare the different samples. Several examples are given to demonstrate the application of the proposed approach. The examples show that the proposed approach, which has no demand for the probability distribution of small samples, is feasible and effective.展开更多
A novel data streams partitioning method is proposed to resolve problems of range-aggregation continuous queries over parallel streams for power industry.The first step of this method is to parallel sample the data,wh...A novel data streams partitioning method is proposed to resolve problems of range-aggregation continuous queries over parallel streams for power industry.The first step of this method is to parallel sample the data,which is implemented as an extended reservoir-sampling algorithm.A skip factor based on the change ratio of data-values is introduced to describe the distribution characteristics of data-values adaptively.The second step of this method is to partition the fluxes of data streams averagely,which is implemented with two alternative equal-depth histogram generating algorithms that fit the different cases:one for incremental maintenance based on heuristics and the other for periodical updates to generate an approximate partition vector.The experimental results on actual data prove that the method is efficient,practical and suitable for time-varying data streams processing.展开更多
China's continental deposition basins are characterized by complex geological structures and various reservoir lithologies. Therefore, high precision exploration methods are needed. High density spatial sampling is a...China's continental deposition basins are characterized by complex geological structures and various reservoir lithologies. Therefore, high precision exploration methods are needed. High density spatial sampling is a new technology to increase the accuracy of seismic exploration. We briefly discuss point source and receiver technology, analyze the high density spatial sampling in situ method, introduce the symmetric sampling principles presented by Gijs J. O. Vermeer, and discuss high density spatial sampling technology from the point of view of wave field continuity. We emphasize the analysis of the high density spatial sampling characteristics, including the high density first break advantages for investigation of near surface structure, improving static correction precision, the use of dense receiver spacing at short offsets to increase the effective coverage at shallow depth, and the accuracy of reflection imaging. Coherent noise is not aliased and the noise analysis precision and suppression increases as a result. High density spatial sampling enhances wave field continuity and the accuracy of various mathematical transforms, which benefits wave field separation. Finally, we point out that the difficult part of high density spatial sampling technology is the data processing. More research needs to be done on the methods of analyzing and processing huge amounts of seismic data.展开更多
A rapidly deployable dense seismic monitoring system which is capable of transmitting acquired data in real time and analyzing data automatically is crucial in seismic hazard mitigation after a major earthquake.Howeve...A rapidly deployable dense seismic monitoring system which is capable of transmitting acquired data in real time and analyzing data automatically is crucial in seismic hazard mitigation after a major earthquake.However,it is rather difficult for current seismic nodal stations to transmit data in real time for an extended period of time,and it usually takes a great amount of time to process the acquired data manually.To monitor earthquakes in real time flexibly,we develop a mobile integrated seismic monitoring system consisting of newly developed nodal units with 4G telemetry and a real-time AI-assisted automatic data processing workflow.The integrated system is convenient for deployment and has been successfully applied in monitoring the aftershocks of the Yangbi M_(S) 6.4 earthquake occurred on May 21,2021 in Yangbi County,Dali,Yunnan in southwest China.The acquired seismic data are transmitted almost in real time through the 4G cellular network,and then processed automat-ically for event detection,positioning,magnitude calculation and source mechanism inversion.From tens of seconds to a couple of minutes at most,the final seismic attributes can be presented remotely to the end users through the integrated system.From May 27 to June 17,the real-time system has detected and located 7905 aftershocks in the Yangbi area before the internal batteries exhausted,far more than the catalog provided by China Earthquake Networks Center using the regional permanent stations.The initial application of this inte-grated real-time monitoring system is promising,and we anticipate the advent of a new era for Real-time Intelligent Array Seismology(RIAS),for better monitoring and understanding the subsurface dynamic pro-cesses caused by Earth's internal forces as well as anthropogenic activities.展开更多
In this paper, we introduce a system architecture for a patient centered mobile health monitoring (PCMHM) system that deploys different sensors to determine patients’ activities, medical conditions, and the cause of ...In this paper, we introduce a system architecture for a patient centered mobile health monitoring (PCMHM) system that deploys different sensors to determine patients’ activities, medical conditions, and the cause of an emergency event. This system combines and analyzes sensor data to produce the patients’ detailed health information in real-time. A central computational node with data analyzing capability is used for sensor data integration and analysis. In addition to medical sensors, surrounding environmental sensors are also utilized to enhance the interpretation of the data and to improve medical diagnosis. The PCMHM system has the ability to provide on-demand health information of patients via the Internet, track real-time daily activities and patients’ health condition. This system also includes the capability for assessing patients’ posture and fall detection.展开更多
In July of 1987, the Sampling Survey of Children's Situation was conducted in 9 provincesautonomous regions of China. A stratified two--stage cluster sampling plan was designed for thesurvey. The paper presents th...In July of 1987, the Sampling Survey of Children's Situation was conducted in 9 provincesautonomous regions of China. A stratified two--stage cluster sampling plan was designed for thesurvey. The paper presents the methods of stratification, selecting n=2 PSU's (cities/counties) withunequal probabilities without replacement in each stratum and selecting residents/village committeein each sampled city/county. All formulae of estimating population characteristics (especiallypopulation totals and the ratios of two totals), and estimating variances of those estimators aregiven. Finally, we analyse the precision of the survey preliminarily from the result of dataprocessing.展开更多
现有的网络安全态势评估方法没有考虑到工业控制系统(industrial control system,ICS)网络安全需求的特殊性,无法实现准确的评估。此外,ICS传输大量异构数据,容易受到网络攻击,现有的分类方法无法有效处理多类别不平衡数据。针对该问题...现有的网络安全态势评估方法没有考虑到工业控制系统(industrial control system,ICS)网络安全需求的特殊性,无法实现准确的评估。此外,ICS传输大量异构数据,容易受到网络攻击,现有的分类方法无法有效处理多类别不平衡数据。针对该问题,本文首先分析了工控系统的特点,提出了基于层次分析法的工控系统安全态势量化评估方法,该方法可以更准确地反映ICS网络安全状况;然后针对多攻击类型数据不平衡问题,提出了平均欠过采样方法,以平衡数据并且不会导致数据量过大;最后基于极限梯度提升(extreme gradient boosting,XGBoost)算法构建了ICS网络态势评估分类器,实验表明,本文设计的分类模型相较于传统分类算法支持向量机、K近邻以及随机森林可以实现更好的精度。展开更多
针对句子分类任务常面临着训练数据不足,而且文本语言具有离散性,在语义保留的条件下进行数据增强具有一定困难,语义一致性和多样性难以平衡的问题,本文提出一种惩罚生成式预训练语言模型的数据增强方法(punishing generative pre-train...针对句子分类任务常面临着训练数据不足,而且文本语言具有离散性,在语义保留的条件下进行数据增强具有一定困难,语义一致性和多样性难以平衡的问题,本文提出一种惩罚生成式预训练语言模型的数据增强方法(punishing generative pre-trained transformer for data augmentation,PunishGPT-DA)。设计了惩罚项和超参数α,与负对数似然损失函数共同作用微调GPT-2(generative pre-training 2.0),鼓励模型关注那些预测概率较小但仍然合理的输出;使用基于双向编码器表征模型(bidirectional encoder representation from transformers,BERT)的过滤器过滤语义偏差较大的生成样本。本文方法实现了对训练集16倍扩充,与GPT-2相比,在意图识别、问题分类以及情感分析3个任务上的准确率分别提升了1.1%、4.9%和8.7%。实验结果表明,本文提出的方法能够同时有效地控制一致性和多样性需求,提升下游任务模型的训练性能。展开更多
流形数据由一些弧线状或环状的类簇组成,其特点是同一类簇的样本间距离差距较大。密度峰值聚类算法不能有效识别流形类簇的类簇中心且分配剩余样本时易引发样本的连续误分配问题。为此,本文提出面向流形数据的共享近邻密度峰值聚类(dens...流形数据由一些弧线状或环状的类簇组成,其特点是同一类簇的样本间距离差距较大。密度峰值聚类算法不能有效识别流形类簇的类簇中心且分配剩余样本时易引发样本的连续误分配问题。为此,本文提出面向流形数据的共享近邻密度峰值聚类(density peaks clustering based on shared nearest neighbor for manifold datasets,DPC-SNN)算法。提出了一种基于共享近邻的样本相似度定义方式,使得同一流形类簇样本间的相似度尽可能高;基于上述相似度定义局部密度,不忽略距类簇中心较远样本的密度贡献,能更好地区分出流形类簇的类簇中心与其他样本;根据样本的相似度分配剩余样本,避免了样本的连续误分配。DPC-SNN算法与DPC、FKNNDPC、FNDPC、DPCSA及IDPC-FA算法的对比实验结果表明,DPC-SNN算法能够有效发现流形数据的类簇中心并准确完成聚类,对真实以及人脸数据集也有不错的聚类效果。展开更多
文摘Data processing of small samples is an important and valuable research problem in the electronic equipment test. Because it is difficult and complex to determine the probability distribution of small samples, it is difficult to use the traditional probability theory to process the samples and assess the degree of uncertainty. Using the grey relational theory and the norm theory, the grey distance information approach, which is based on the grey distance information quantity of a sample and the average grey distance information quantity of the samples, is proposed in this article. The definitions of the grey distance information quantity of a sample and the average grey distance information quantity of the samples, with their characteristics and algorithms, are introduced. The correlative problems, including the algorithm of estimated value, the standard deviation, and the acceptance and rejection criteria of the samples and estimated results, are also proposed. Moreover, the information whitening ratio is introduced to select the weight algorithm and to compare the different samples. Several examples are given to demonstrate the application of the proposed approach. The examples show that the proposed approach, which has no demand for the probability distribution of small samples, is feasible and effective.
基金The High Technology Research Plan of Jiangsu Prov-ince (No.BG2004034)the Foundation of Graduate Creative Program ofJiangsu Province (No.xm04-36).
文摘A novel data streams partitioning method is proposed to resolve problems of range-aggregation continuous queries over parallel streams for power industry.The first step of this method is to parallel sample the data,which is implemented as an extended reservoir-sampling algorithm.A skip factor based on the change ratio of data-values is introduced to describe the distribution characteristics of data-values adaptively.The second step of this method is to partition the fluxes of data streams averagely,which is implemented with two alternative equal-depth histogram generating algorithms that fit the different cases:one for incremental maintenance based on heuristics and the other for periodical updates to generate an approximate partition vector.The experimental results on actual data prove that the method is efficient,practical and suitable for time-varying data streams processing.
文摘China's continental deposition basins are characterized by complex geological structures and various reservoir lithologies. Therefore, high precision exploration methods are needed. High density spatial sampling is a new technology to increase the accuracy of seismic exploration. We briefly discuss point source and receiver technology, analyze the high density spatial sampling in situ method, introduce the symmetric sampling principles presented by Gijs J. O. Vermeer, and discuss high density spatial sampling technology from the point of view of wave field continuity. We emphasize the analysis of the high density spatial sampling characteristics, including the high density first break advantages for investigation of near surface structure, improving static correction precision, the use of dense receiver spacing at short offsets to increase the effective coverage at shallow depth, and the accuracy of reflection imaging. Coherent noise is not aliased and the noise analysis precision and suppression increases as a result. High density spatial sampling enhances wave field continuity and the accuracy of various mathematical transforms, which benefits wave field separation. Finally, we point out that the difficult part of high density spatial sampling technology is the data processing. More research needs to be done on the methods of analyzing and processing huge amounts of seismic data.
基金supported by the National Natural Science Foundation of China (under grants 41874048,41790464,41790462).
文摘A rapidly deployable dense seismic monitoring system which is capable of transmitting acquired data in real time and analyzing data automatically is crucial in seismic hazard mitigation after a major earthquake.However,it is rather difficult for current seismic nodal stations to transmit data in real time for an extended period of time,and it usually takes a great amount of time to process the acquired data manually.To monitor earthquakes in real time flexibly,we develop a mobile integrated seismic monitoring system consisting of newly developed nodal units with 4G telemetry and a real-time AI-assisted automatic data processing workflow.The integrated system is convenient for deployment and has been successfully applied in monitoring the aftershocks of the Yangbi M_(S) 6.4 earthquake occurred on May 21,2021 in Yangbi County,Dali,Yunnan in southwest China.The acquired seismic data are transmitted almost in real time through the 4G cellular network,and then processed automat-ically for event detection,positioning,magnitude calculation and source mechanism inversion.From tens of seconds to a couple of minutes at most,the final seismic attributes can be presented remotely to the end users through the integrated system.From May 27 to June 17,the real-time system has detected and located 7905 aftershocks in the Yangbi area before the internal batteries exhausted,far more than the catalog provided by China Earthquake Networks Center using the regional permanent stations.The initial application of this inte-grated real-time monitoring system is promising,and we anticipate the advent of a new era for Real-time Intelligent Array Seismology(RIAS),for better monitoring and understanding the subsurface dynamic pro-cesses caused by Earth's internal forces as well as anthropogenic activities.
文摘In this paper, we introduce a system architecture for a patient centered mobile health monitoring (PCMHM) system that deploys different sensors to determine patients’ activities, medical conditions, and the cause of an emergency event. This system combines and analyzes sensor data to produce the patients’ detailed health information in real-time. A central computational node with data analyzing capability is used for sensor data integration and analysis. In addition to medical sensors, surrounding environmental sensors are also utilized to enhance the interpretation of the data and to improve medical diagnosis. The PCMHM system has the ability to provide on-demand health information of patients via the Internet, track real-time daily activities and patients’ health condition. This system also includes the capability for assessing patients’ posture and fall detection.
基金Supported partially by the National Funds of Natural Sciences, 7860013
文摘In July of 1987, the Sampling Survey of Children's Situation was conducted in 9 provincesautonomous regions of China. A stratified two--stage cluster sampling plan was designed for thesurvey. The paper presents the methods of stratification, selecting n=2 PSU's (cities/counties) withunequal probabilities without replacement in each stratum and selecting residents/village committeein each sampled city/county. All formulae of estimating population characteristics (especiallypopulation totals and the ratios of two totals), and estimating variances of those estimators aregiven. Finally, we analyse the precision of the survey preliminarily from the result of dataprocessing.
文摘现有的网络安全态势评估方法没有考虑到工业控制系统(industrial control system,ICS)网络安全需求的特殊性,无法实现准确的评估。此外,ICS传输大量异构数据,容易受到网络攻击,现有的分类方法无法有效处理多类别不平衡数据。针对该问题,本文首先分析了工控系统的特点,提出了基于层次分析法的工控系统安全态势量化评估方法,该方法可以更准确地反映ICS网络安全状况;然后针对多攻击类型数据不平衡问题,提出了平均欠过采样方法,以平衡数据并且不会导致数据量过大;最后基于极限梯度提升(extreme gradient boosting,XGBoost)算法构建了ICS网络态势评估分类器,实验表明,本文设计的分类模型相较于传统分类算法支持向量机、K近邻以及随机森林可以实现更好的精度。
文摘针对句子分类任务常面临着训练数据不足,而且文本语言具有离散性,在语义保留的条件下进行数据增强具有一定困难,语义一致性和多样性难以平衡的问题,本文提出一种惩罚生成式预训练语言模型的数据增强方法(punishing generative pre-trained transformer for data augmentation,PunishGPT-DA)。设计了惩罚项和超参数α,与负对数似然损失函数共同作用微调GPT-2(generative pre-training 2.0),鼓励模型关注那些预测概率较小但仍然合理的输出;使用基于双向编码器表征模型(bidirectional encoder representation from transformers,BERT)的过滤器过滤语义偏差较大的生成样本。本文方法实现了对训练集16倍扩充,与GPT-2相比,在意图识别、问题分类以及情感分析3个任务上的准确率分别提升了1.1%、4.9%和8.7%。实验结果表明,本文提出的方法能够同时有效地控制一致性和多样性需求,提升下游任务模型的训练性能。
文摘流形数据由一些弧线状或环状的类簇组成,其特点是同一类簇的样本间距离差距较大。密度峰值聚类算法不能有效识别流形类簇的类簇中心且分配剩余样本时易引发样本的连续误分配问题。为此,本文提出面向流形数据的共享近邻密度峰值聚类(density peaks clustering based on shared nearest neighbor for manifold datasets,DPC-SNN)算法。提出了一种基于共享近邻的样本相似度定义方式,使得同一流形类簇样本间的相似度尽可能高;基于上述相似度定义局部密度,不忽略距类簇中心较远样本的密度贡献,能更好地区分出流形类簇的类簇中心与其他样本;根据样本的相似度分配剩余样本,避免了样本的连续误分配。DPC-SNN算法与DPC、FKNNDPC、FNDPC、DPCSA及IDPC-FA算法的对比实验结果表明,DPC-SNN算法能够有效发现流形数据的类簇中心并准确完成聚类,对真实以及人脸数据集也有不错的聚类效果。