摘要
针对长非编码RNA(long non-coding RNA,lnc RNA)数据类型多样带来的有用信息提取困难的问题,提出基于基因组浏览器GBrowse(Generic Genome Browser)的多源lnc RNA数据可视化系统.该系统主要包括网页服务器和lnc RNA数据存储.其中,网页服务器主要由HTTP服务和GBrowse网页组件构成,支持纯文本、My SQL、SQLite等多种数据存储方式.系统实现流程包括GBrowse安装与配置、多源lnc RNA数据的收集、数据预处理、数据存储、数据访问及可视化配置.原型系统收集了六种人类lnc RNA数据,包括人类基因注释、基因组序列、组蛋白修饰H3K4me3信号及其位点、转录因子CTCF绑定位点信号及其位点的数据,并对数据进行了预处理.通过My SQL、SQLite等建立了lnc RNA数据库,对数据的访问方式和可视化参数进行配置.实验结果表明,多源lnc RNA数据在GBrowse框架下能够得到整合与可视化,并在基因组空间同时呈现,这使得研究者能够以更加直观的方式观测数据,进而建立新的科学假说.
In consideration of the problem that useful information cannot be easily extracted from various types of long noncoding RNA(lnc RNA) data, this paper proposes a visualization system of multi-source lnc RNA data based on generic genome browser(GBrowse). The system mainly includes a web server including HTTP service and GBrowse components, and lnc RNA data storage which supports flat files, My SQL, SQLite and other types of databases. The main steps of constructing the system include GBrowse installation and configuration, multi-source lnc RNA data collection, preprocessing, storage, and access and visualization configuration. A demo system is constructed by firstly collecting six sets of human lnc RNA data, including human gene annotation, genome sequence, histone modification H3K4me3 signals and their loci predicted, signals of transcription factor CTCF binding sites and their loci predicted. After preprocessing, these data are stored by databases such as My SQL, SQLite and so on, and data access and visualization methods are also configured. The experiment results demonstrate that multi-source lnc RNA data can be integrated and visualized within the GBrowse framework, and be showed in the genome spatial space simultaneously, which can make researchers observe the lnc RNA data more intuitively, thereby helps to produce novel scientific hypothesis.
作者
孙磊
陈璇
唐红
魏李婷
姬岚洋
施胜飞
杨晓华
SUN Lei CHEN Xuan TANG Hong WEI Li-Ting JI Lan-Yang SHI Sheng-Fei YANG Xiao-Hua(School of Information Engineering, Yangzhou University, Yangzhou 225127, China)
出处
《计算机系统应用》
2017年第3期81-85,共5页
Computer Systems & Applications
基金
国家自然科学基金(61301220)
扬州大学大学生学术科技创新基金(x2015423
x2015444)