期刊文献+

在线孟德尔人类遗传数据库数据挖掘的研究进展 被引量:7

Review on the Research Progress of Mining of OMIM Data
原文传递
导出
摘要 在线孟德尔人类遗传数据库(OMIM)是描述人类遗传病及其相关基因的知识库,其词条包括疾病的临床特征、基因连锁分析、染色体定位以及动物模型等,是研究疾病与基因关系的重要依据。疾病表型的相似性可能提示分子之间的相互作用。进行表型比对将有助于预测疾病候选基因以及分析分子之间的关系。OMIM数据库采用文本描述疾病表型,并不适用于计算机分析。对OMIM数据进行标准化对于大规模比对和分析疾病的表型数据、建立表型与基因的对应关系具有重要的意义。研究者近期通过引入标准的医学语言系统,采用文本挖掘中的词频-逆文档频率技术以及用于文档分类的余弦定理方法,结合基因本体论及其比对方法,推动了OMIM数据挖掘的快速发展。本文总结了近年来OMIM数据标准化、表型相似性度量及数据挖掘研究的主要成果,并对其发展趋势进行了预测。 Online Mendelian Inheritance in Man (OMIM) is a knowledge source and data base for human genetic dis- eases and related genes. Each OMIM entry ineludes clinical synopsis, linkage analysis for candidate genes, chromo- somal localization and animal models, which has become an authoritative source of information for the study of the relationship between genes and diseases. As overlap of disease symptoms may reflect interactions at the molecular level, comparison of phenotypic similarity may indicate candidate genes and help to discover functional connections between genes and proteins. However, the OMIM has used free text to describe disease phenotypes, which does not suit computer analysis. Standardization of OMIM data therefore has important implications for large-scale comparison of disease phenotypes and prediction of phenotype-genotype correlations. Recently, standard medical language sys- tems, term frequency-inverse document frequency and the law of cosines for document classification have been intro- duced for mining of OMIM data. Combined with Gene Ontology and various comparison methods, this has achieved substantial successes. In this article, we have reviewed various methods for standardization and similarity comparison of OMIM data. We also predicted the trend for research in this direction.
出处 《生物医学工程学杂志》 EI CAS CSCD 北大核心 2014年第6期1400-1404,共5页 Journal of Biomedical Engineering
基金 国家自然科学基金资助项目(81072899 61071213 81473446)
关键词 疾病表型-基因型关系 文本挖掘 相似性比较 候选基因 分子通路 phenotype-genotype correlation text mining similarity comparison candidate gene molecular pathway
  • 相关文献

参考文献2

二级参考文献51

  • 1S. E. Andrew, Y. P. Goldberg, B. Kremer, et al., The relationship between trinucleotide (cag) repeat length and clinical features of huntington's disease, Nat. Genet., 1993, 4: 398-403.
  • 2K. Kieburtz, M. MacDonald, C. Shih, et al.Trinucleotide repeat length and progression of illness in huntington's disease, J. Med. Genet., 1994, 31: 872-874.
  • 3G. A. Singer and D. A. Hickey, Nucleotide bias causes a genomewide bias in the amino acid composition of proteins, Mol. Biol. Evol., 2000, 17: 1581-1588.
  • 4F. Naumann, H. Muller-Hartmann, H. Deissler, and W. Doerfler, On the function of the cgg-binding protein, Gene Function and Disease, 2001, 2(2 3): 89-94.
  • 5Sputnik. URL: http://espressosoftware.com/sputnik/index.html, 1994.
  • 6G. Benson, Tandem repeats finder: A program to analyze dna sequences, Nucleic Acids Res., 1999, 27:573 -580.
  • 7V. Parisi, V. De Fonzo, and F. Aluffi-Pentini, String: Finding tandem repeats in dna sequences, Bioinformatics, 2003, 19:1733- 1738.
  • 8R. Kolpakov, G. Bana, and G. Kucherov, Mreps: Efficient and flexible detection of tandem repeats in dna, Nucleic Acids Res., 2003, 31: 3672-3678.
  • 9Y. Wexler, Z. Yakhini, Y. Kashi, and D. Geiger, Finding approximate tandem repeats in genomic sequences, Recomb '04: Proceedings of the Eighth Annual International Conference on Resaerch in Computational Molecular Biology, 2004: 223-232.
  • 10Msatfinder: Detection and characterisation of microsatellites. URL: http://www.genomics.ceh.ac.uk/msatfinder/, 2005.

同被引文献85

引证文献7

二级引证文献39

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部