摘要
设计一种新型的数据存储结构和检索方法,以实现对短串联重复序列数据的合理存储和快速检索。对不同STR基因座位点设置不同的域;在不同的域中,构建倒排索引结构来存储STR基因座数据;在此基础上,构建了基于STR基因座数据相似度的检索排序算法。该方法有效解决了采用传统关系型数据库存储STR基因座数据时存在的检索效率低下、难以应对基因突变以及扩展性差等问题。采用文中提出的数据存储结构和检索算法,极大地提高了系统的检索性能和可扩展性。
In order to enable effectively storage and fast retrieval of the short tandem repeat (STR) data, this paper designed a novel data storage structure and a new retrieval method. Different fields were set for different STR loci. Inverted indexes were then built to store STR data in different fields. Finally, a retrieval algorithm was developed to rank the candidates according to the similarity of their STR data with the input STR data. The proposed method solved several problems of traditional methods that used relational database such as low retrieval efficiency, difficulty in dealing with genetic mutations and bad expansibility. The retrieval efficiency and expansibility of the system were largely improved by using the proposed data storage structure and retrieval method.
作者
刘健
宁玉文
孙茂
许浩
李宝娟
LIU Jian;NING Yu-wen;SUN Mao;XU Hao;LI Bao-juan(Network Center,The Fourth Military Medical University,PLA,Xi'an 710032,China;School of Medicine,The Fourth Military Medical University,PLA,Xi'an 710032,China;Department of Military Biomedical Engineering,The Fourth Military Medical University,PLA Xi'an 710032,China)
出处
《信息技术》
2018年第10期73-76,共4页
Information Technology
基金
陕西省工业科技攻关(2016GY-094)
关键词
STR
亲子鉴定
倒排索引
检索
STR
paternity testing
inverted index
retrieval