For protecting the copyright of a text and recovering its original content harmlessly,this paper proposes a novel reversible natural language watermarking method that combines arithmetic coding and synonym substitutio...For protecting the copyright of a text and recovering its original content harmlessly,this paper proposes a novel reversible natural language watermarking method that combines arithmetic coding and synonym substitution operations.By analyzing relative frequencies of synonymous words,synonyms employed for carrying payload are quantized into an unbalanced and redundant binary sequence.The quantized binary sequence is compressed by adaptive binary arithmetic coding losslessly to provide a spare for accommodating additional data.Then,the compressed data appended with the watermark are embedded into the cover text via synonym substitutions in an invertible manner.On the receiver side,the watermark and compressed data can be extracted by decoding the values of synonyms in the watermarked text,as a result of which the original context can be perfectly recovered by decompressing the extracted compressed data and substituting the replaced synonyms with their original synonyms.Experimental results demonstrate that the proposed method can extract the watermark successfully and achieve a lossless recovery of the original text.Additionally,it achieves a high embedding capacity.展开更多
Methods for estimating synonymous and nonsynonymous substitution rates among protein-coding sequences adopt different mutation (substitution) models with subtle yet significant differences, which lead to different est...Methods for estimating synonymous and nonsynonymous substitution rates among protein-coding sequences adopt different mutation (substitution) models with subtle yet significant differences, which lead to different estimates of evolutionary information. Little attention has been devoted to the comparison of methods for obtaining reliable estimates since the amount of sequence variations within targeted datasets is always unpredictable. To our knowledge, there is little information available in literature about evaluation of these different methods. In this study, we compared six widely used methods and provided with evaluation results using simulated sequences. The results indicate that incorporating sequence features (such as transition/transversion bias and nucleotide/codon frequency bias) into methods could yield better performance. We recommend that conclusions related to or derived from Ka and Ks analyses should not be readily drawn only according to results from one method.展开更多
Cereal genes are classified into two distinct classes according to the guanine-cytosine (GC) content at the third codon sites (GC3). Natural selection and mutation bias have been proposed to affect the GC content....Cereal genes are classified into two distinct classes according to the guanine-cytosine (GC) content at the third codon sites (GC3). Natural selection and mutation bias have been proposed to affect the GC content. However, there has been controversy about the cause of GC variation. Here, we characterized the GC content of 1 092 paralogs and other single-copy genes in the duplicated chromosomal regions of the rice genome (ssp. indica) and classified the paralogs into GC3-rich and GC3-poor groups. By referring to out-group sequences from Arabidopsis and maize, we confirmed that the average synonymous substitution rate of the GC3-rich genes is significantly lower than that of the GC3-poor genes. Furthermore, we explored the other possible factors corresponding to the GC variation including the length of coding sequences, the number of exons in each gene, the number of genes in each family, the location of genes on chromosomes and the protein functions. Consequently, we propose that natural selection rather than mutation bias was the primary cause of the GC variation.展开更多
基金This project is supported by National Natural Science Foundation of China(No.61202439)partly supported by Scientific Research Foundation of Hunan Provincial Education Department of China(No.16A008)partly supported by Hunan Key Laboratory of Smart Roadway and Cooperative Vehicle-Infrastructure Systems(No.2017TP1016).
文摘For protecting the copyright of a text and recovering its original content harmlessly,this paper proposes a novel reversible natural language watermarking method that combines arithmetic coding and synonym substitution operations.By analyzing relative frequencies of synonymous words,synonyms employed for carrying payload are quantized into an unbalanced and redundant binary sequence.The quantized binary sequence is compressed by adaptive binary arithmetic coding losslessly to provide a spare for accommodating additional data.Then,the compressed data appended with the watermark are embedded into the cover text via synonym substitutions in an invertible manner.On the receiver side,the watermark and compressed data can be extracted by decoding the values of synonyms in the watermarked text,as a result of which the original context can be perfectly recovered by decompressing the extracted compressed data and substituting the replaced synonyms with their original synonyms.Experimental results demonstrate that the proposed method can extract the watermark successfully and achieve a lossless recovery of the original text.Additionally,it achieves a high embedding capacity.
基金This work was supported by the Ministry of Science and Technology of China(Grant No.2001AA231061)the National Natural Science Foundation of China(Grant No.30270748)awarded to JY.We thank Mr.Jun Li for valuable discussion.
文摘Methods for estimating synonymous and nonsynonymous substitution rates among protein-coding sequences adopt different mutation (substitution) models with subtle yet significant differences, which lead to different estimates of evolutionary information. Little attention has been devoted to the comparison of methods for obtaining reliable estimates since the amount of sequence variations within targeted datasets is always unpredictable. To our knowledge, there is little information available in literature about evaluation of these different methods. In this study, we compared six widely used methods and provided with evaluation results using simulated sequences. The results indicate that incorporating sequence features (such as transition/transversion bias and nucleotide/codon frequency bias) into methods could yield better performance. We recommend that conclusions related to or derived from Ka and Ks analyses should not be readily drawn only according to results from one method.
基金the State Key Basic Research and Development Plan of China(2003CB715900)the National Natural Science Foundation of China(90408015,30121003 and 30430030).
文摘Cereal genes are classified into two distinct classes according to the guanine-cytosine (GC) content at the third codon sites (GC3). Natural selection and mutation bias have been proposed to affect the GC content. However, there has been controversy about the cause of GC variation. Here, we characterized the GC content of 1 092 paralogs and other single-copy genes in the duplicated chromosomal regions of the rice genome (ssp. indica) and classified the paralogs into GC3-rich and GC3-poor groups. By referring to out-group sequences from Arabidopsis and maize, we confirmed that the average synonymous substitution rate of the GC3-rich genes is significantly lower than that of the GC3-poor genes. Furthermore, we explored the other possible factors corresponding to the GC variation including the length of coding sequences, the number of exons in each gene, the number of genes in each family, the location of genes on chromosomes and the protein functions. Consequently, we propose that natural selection rather than mutation bias was the primary cause of the GC variation.