随着大规模预训练语言模型的出现,文本生成技术已取得突破性进展。然而,在开放性文本生成领域,生成的内容缺乏拟人化的情感特征,使生成的文本难以让人产生共鸣和情感上的联系,可控文本生成在弥补当前文本生成技术不足方面具有重要意义...随着大规模预训练语言模型的出现,文本生成技术已取得突破性进展。然而,在开放性文本生成领域,生成的内容缺乏拟人化的情感特征,使生成的文本难以让人产生共鸣和情感上的联系,可控文本生成在弥补当前文本生成技术不足方面具有重要意义。首先,在ChnSentiCorp数据集的基础上完成主题和情感属性的扩展,同时,为构建一个可生成流畅文本且情感丰富的多元可控文本生成模型,提出一种基于扩散序列的可控文本生成模型DiffuSeq-PT。该模型以扩散模型为基础架构,利用主题情感属性和文本数据在无分类器引导条件下对序列执行扩散过程,使用预训练模型ERNIE 3.0(Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation)的编码解码能力贴合扩散模型的加噪去噪过程,最终生成符合相关主题和多情感粒度的目标文本。与基准模型DiffuSeq相比,所提模型在2个公开的真实数据集(ChnSentiCorp和辩论数据集)上分别取得0.13和0.01的BERTScore值的提升,困惑度分别下降了14.318和9.46。展开更多
D-T_(2)two-dimensional nuclear magnetic resonance(2D NMR)logging technology can distinguish pore fluid types intuitively,and it is widely used in oil and gas exploration.Many 2D NMR inversion methods(e.g.,truncated si...D-T_(2)two-dimensional nuclear magnetic resonance(2D NMR)logging technology can distinguish pore fluid types intuitively,and it is widely used in oil and gas exploration.Many 2D NMR inversion methods(e.g.,truncated singular value decomposition(TSVD),Butler-Reds-Dawson(BRD),LM-norm smoothing,and TIST-L1 regularization methods)have been proposed successively,but most are limited to numerical simulations.This study focused on the applicability of different inversion methods for NMR logging data of various acquisition sequences,from which the optimal inversion method was selected based on the comparative analysis.First,the two-dimensional NMR logging principle was studied.Then,these inversion methods were studied in detail,and the precision and computational efficiency of CPMG and diffusion editing(DE)sequences obtained from oil-water and gas-water models were compared,respectively.The inversion results and calculation time of truncated singular value decomposition(TSVD),Butler-Reds-Dawson(BRD),LM-norm smoothing,and TIST-L1 regularization were compared and analyzed through numerical simulations.The inversion method was optimized to process SP mode logging data from the MR Scanner instrument.The results showed that the TIST-regularization and LM-norm smoothing methods were more accurate for the CPMG and DE sequence echo trains of the oil-water and gas-water models.However,the LM-norm smoothing method was less time-consuming,making it more suitable for logging data processing.A case study in well A25 showed that the processing results by the LM-norm smoothing method were consistent with GEOLOG software.This demonstrates that the LM-norm smoothing method is applicable in practical NMR logging processing.展开更多
Compared with the histogram of Discrete Cosine Transform (DCT) coefficients before the Direct Sequence Spread Spectrum (DSSS) embedding, the peak value of the histogram after the embedding decreases and expands toward...Compared with the histogram of Discrete Cosine Transform (DCT) coefficients before the Direct Sequence Spread Spectrum (DSSS) embedding, the peak value of the histogram after the embedding decreases and expands toward the border. Based on the property, an audio steganalysis of DSSS based on statistical moments of histogram is proposed. The statistical moments of the histogram in DCT domain and its frequency domain and the statistical moments of the histogram of the wavelet coefficients of every level in frequency domain are calculated as the features of classification. Support Vector Machine (SVM) is exploited as the classifier. Experimental results show that the proposed technique is effective on the DSSS embedding in DCT domain using different embedding length, and the average detection rate is 91.75%.展开更多
文摘随着大规模预训练语言模型的出现,文本生成技术已取得突破性进展。然而,在开放性文本生成领域,生成的内容缺乏拟人化的情感特征,使生成的文本难以让人产生共鸣和情感上的联系,可控文本生成在弥补当前文本生成技术不足方面具有重要意义。首先,在ChnSentiCorp数据集的基础上完成主题和情感属性的扩展,同时,为构建一个可生成流畅文本且情感丰富的多元可控文本生成模型,提出一种基于扩散序列的可控文本生成模型DiffuSeq-PT。该模型以扩散模型为基础架构,利用主题情感属性和文本数据在无分类器引导条件下对序列执行扩散过程,使用预训练模型ERNIE 3.0(Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation)的编码解码能力贴合扩散模型的加噪去噪过程,最终生成符合相关主题和多情感粒度的目标文本。与基准模型DiffuSeq相比,所提模型在2个公开的真实数据集(ChnSentiCorp和辩论数据集)上分别取得0.13和0.01的BERTScore值的提升,困惑度分别下降了14.318和9.46。
基金sponsored by the National Natural Science Foundation of China(Nos.42174149,41774144)the National Major Projects(No.2016ZX05014-001).
文摘D-T_(2)two-dimensional nuclear magnetic resonance(2D NMR)logging technology can distinguish pore fluid types intuitively,and it is widely used in oil and gas exploration.Many 2D NMR inversion methods(e.g.,truncated singular value decomposition(TSVD),Butler-Reds-Dawson(BRD),LM-norm smoothing,and TIST-L1 regularization methods)have been proposed successively,but most are limited to numerical simulations.This study focused on the applicability of different inversion methods for NMR logging data of various acquisition sequences,from which the optimal inversion method was selected based on the comparative analysis.First,the two-dimensional NMR logging principle was studied.Then,these inversion methods were studied in detail,and the precision and computational efficiency of CPMG and diffusion editing(DE)sequences obtained from oil-water and gas-water models were compared,respectively.The inversion results and calculation time of truncated singular value decomposition(TSVD),Butler-Reds-Dawson(BRD),LM-norm smoothing,and TIST-L1 regularization were compared and analyzed through numerical simulations.The inversion method was optimized to process SP mode logging data from the MR Scanner instrument.The results showed that the TIST-regularization and LM-norm smoothing methods were more accurate for the CPMG and DE sequence echo trains of the oil-water and gas-water models.However,the LM-norm smoothing method was less time-consuming,making it more suitable for logging data processing.A case study in well A25 showed that the processing results by the LM-norm smoothing method were consistent with GEOLOG software.This demonstrates that the LM-norm smoothing method is applicable in practical NMR logging processing.
基金Supported by the National Natural Science Foundation of China (No.60772032)
文摘Compared with the histogram of Discrete Cosine Transform (DCT) coefficients before the Direct Sequence Spread Spectrum (DSSS) embedding, the peak value of the histogram after the embedding decreases and expands toward the border. Based on the property, an audio steganalysis of DSSS based on statistical moments of histogram is proposed. The statistical moments of the histogram in DCT domain and its frequency domain and the statistical moments of the histogram of the wavelet coefficients of every level in frequency domain are calculated as the features of classification. Support Vector Machine (SVM) is exploited as the classifier. Experimental results show that the proposed technique is effective on the DSSS embedding in DCT domain using different embedding length, and the average detection rate is 91.75%.