Multiplexed sequencing relies on specific sample labels,the barcodes,to tag DNA fragments belonging to different samples and to separate the output of the sequencers.However,the barcodes are often corrupted by inserti...Multiplexed sequencing relies on specific sample labels,the barcodes,to tag DNA fragments belonging to different samples and to separate the output of the sequencers.However,the barcodes are often corrupted by insertion,deletion and substitution errors introduced during sequencing,which may lead to sample misassignment.In this paper,we propose a barcode construction method,which combines a block error-correction code with a predetermined pseudorandom sequence to generate a base sequence for labeling different samples.Furthermore,to identify the corrupted barcodes for assigning reads to their respective samples,we present a soft decision identification method that consists of inner decoding and outer decoding.The inner decoder establishes the hidden Markov model(HMM)for base insertion/deletion estimation with the pseudorandom sequence,and adapts the forward-backward(FB)algorithm to output the soft information of each bit in the block code.The outer decoder performs soft decision decoding using the soft information to effectively correct multiple errors in the barcodes.Simulation results show that the proposed methods are highly robust to high error rates of insertions,deletions and substitutions in the barcodes.In addition,compared with the inner decoding algorithm of the barcodes based on watermarks,the proposed inner decoding algorithm can greatly reduce the decoding complexity.展开更多
基金Supported in part by the National Natural Science Foundation of China(61671324)Seed Foundation of Tianjin University(2019XZY-0038,2019XYF-0005).
文摘Multiplexed sequencing relies on specific sample labels,the barcodes,to tag DNA fragments belonging to different samples and to separate the output of the sequencers.However,the barcodes are often corrupted by insertion,deletion and substitution errors introduced during sequencing,which may lead to sample misassignment.In this paper,we propose a barcode construction method,which combines a block error-correction code with a predetermined pseudorandom sequence to generate a base sequence for labeling different samples.Furthermore,to identify the corrupted barcodes for assigning reads to their respective samples,we present a soft decision identification method that consists of inner decoding and outer decoding.The inner decoder establishes the hidden Markov model(HMM)for base insertion/deletion estimation with the pseudorandom sequence,and adapts the forward-backward(FB)algorithm to output the soft information of each bit in the block code.The outer decoder performs soft decision decoding using the soft information to effectively correct multiple errors in the barcodes.Simulation results show that the proposed methods are highly robust to high error rates of insertions,deletions and substitutions in the barcodes.In addition,compared with the inner decoding algorithm of the barcodes based on watermarks,the proposed inner decoding algorithm can greatly reduce the decoding complexity.