摘要
给出一种基于希尔伯特分形的基因组序列压缩算法.为充分利用碱基间的相关性,算法首先使用希尔伯特分形曲线将基因组序列从一维映射到二维,从而得到映射图像.再对映射图像使用Context加权建模熵编码技术进行压缩.在Context加权中,权值的确定与各Context模型对应的描述长度有关.当接收端收到压缩图像后,对其进行解码,然后根据拟希尔伯特逆矩阵将映射图像转为一维,从而获得基因组序列.实验结果表明,尽管基于希尔伯特空间填充的二维基因组Context建模会引入无效编码区,但最终的压缩结果要略好于其他直接进行Context建模的算法.
The genome compression algorithm based on the Hilbert grouping is proposed to fully utilize the correlations among the basic groups.The Hilbert grouping curve is first used in algorithm to map the genome sequence from one dimension into a new 2-D to obtain the image then compressed by the Context weighting modeling encode technology.In Context weighting,the values of weights are decid-ed by the corresponding description length of the Context models.When the receiver obtains the compressed image and decoded,the supposed Hilbert inverse matrix is used to turn the mapping image into one dimension so as to get genome sequence.The experiments results indicate that although the valid coding area will be led by 2-D genome sequence Context modeling based on the Hilbert space filling,the final compression results by our algorithm are a bit better than other results by the direct Context modeling algorithm.
出处
《昆明学院学报》
2014年第6期42-46,65,共6页
Journal of Kunming University
基金
云南省自然科学基金青年基金资助项目(2013FD042)
云南大学研究生重点科研基金资助项目(ynuy201383)