摘要
第二代高通量RNA-Seq测序技术已成为转录组分析的标准技术手段.寻找差异表达基因作为RNA-Seq测序数据分析中最基本任务之一,提出了大量的分析方法.但是这些不同方法检测出的差异基因往往存在结果不一致性,并且综述性评估已经证明单一方法无法在所有数据集中一直保持优势.因此,提出了一种快速鲁棒的RNA-Seq数据寻找差异表达基因方法RobustDEA,通过自动加权方式结合多种寻找差异表达基因方法,其权值可快速的数据集中学习获得,能有效的体现不同数据集的特点,从而使得RobustDEA方法在不同数据集上都可获得稳定的结果.通过包含qRT-PCR验证的人类大脑数据集和多个老鼠数据集的评估,相比于单个差异表达基因方法和其他结合方法,RobustDEA方法都能获得最准确的预测结果,且表现出很好的鲁棒性能.此外,与PANDOR结合方法相比,RobustDEA方法能大幅度提高计算效率.
The next-generation high-throughput RNA-Seq sequencing technology has become the standard and important technique for transcriptome analysis.Finding differentially expressed genes is one of the most basic tasks in RNA-Seq data analysis,and a large number of statistic methods have been proposed.However,the differential genes detected by these methods are often inconsistent.Some systematic evaluation experiments have proved that no single method can maintain its advantages in all RNA-Seq datasets.Therefore,we propose a fast and robust method for finding differentially expressed genes in RAN-Seq data.RobustDEA combines multiple methods by weighting,and its weights can be quickly learned from the dataset.Because these weights reflect the characteristics of the dataset,RobustDEA is able to obtain stable results on various RNA-Seq datasets.A human brain dataset with qRT-PCR validation,mouse and rat RNA-Seq datasets are used to evaluate our proposed method.Compared dataset with any single method and other combined methods,RobustDEA obtains the most accurate results and shows better robustness.In addition,RobustDEA can significantly improve computational efficiency compared with PANDOR.
作者
张礼
王嘉瑞
吴东洋
ZHANG Li;WANG Jiarui;WU Dongyang(College of Computer Science and Technology, Nanjing Forestry University, Nanjing 210016, China)
出处
《江苏科技大学学报(自然科学版)》
CAS
北大核心
2021年第6期51-58,共8页
Journal of Jiangsu University of Science and Technology:Natural Science Edition
基金
国家自然科学青年基金资助项目(61802193)
江苏省自然科学基金资助项目(BK20170934)
南京林业大学青年科技创新基金资助项目(CX2017031)
南京林业大学大学生创新训练计划项目(2018NFUSPITP452)
汕尾市省级科技创新战略专项资金资助项目(2018D2002)。