期刊文献+

基于精简特征集和融合新特征的基因名识别

Identifying Gene Names Using Reductive Features and New Features
下载PDF
导出
摘要 根据生物医学文本中基因名的特点,提出了一组新特征用于基因名的识别。利用精简的特征集,将提出的新特征融合进精简特征集中。应用Global Linear模型和感知机学习算法在BioCreativeⅡ数据集中对提出的方法进行了验证,结果表明,通过使用数量较少的、区分能力强的特征,仍能使系统达到较高的性能。当融合新特征时,系统的精确率和召回率也有一定的提高。 Based on the features in biomedical text, a new feature method was proposed to recognize gene names. A reductive feature set combined with some new features was employed in the form of gene lexi- cons, applying the method on BioCreative Ⅱ shared dataset with global linear framework and perceptron learning algorithm. Results of the experiment show that in the case of reductive and strong classification features, the system still obtain high performance. When incorporate new features, the precision and recall continue improved to some extent.
出处 《青岛大学学报(自然科学版)》 CAS 2014年第2期61-64,89,共5页 Journal of Qingdao University(Natural Science Edition)
基金 国家自然基金重大研究计划培育项目(批准号:91130035)资助 山东省自然科学基金重点项目(批准号:ZR2012FZ003)资助 山东省自然科学基金青年基金(批准号:ZR2012FQ017)资助
关键词 基因名识别 精简特征集 权值向量 学习算法 gene name recognition reductive feature weight vector learning algorithm
  • 相关文献

参考文献9

  • 1Yeganova L, Smith L, Wilbur W J. Identification of related gene/protein names based on an HMM of name variations[J]. Computational Biology and Chemistry, 2004, 28(2) : 97 - 107.
  • 2王浩畅,赵铁军.生物医学文本挖掘技术的研究与进展[J].中文信息学报,2008,22(3):89-98. 被引量:23
  • 3Li Y P, Lin H F, Yang Z H. Incorporating rich background knowledge for gene named entity classification and recognition[J]. BMC bioinformatics, 2009, 10(1): 223.
  • 4Finkel J, Dingare S, Manning C D, et al. Exploring the boundaries: gene and protein identification in biomedical text[J]. BMC bioinformatics, 2005, 6(Suppl 1); S5.
  • 5Ando R K. BioCreative II gene mention tagging system at IBM Watson[C]//Proceedings of the Second BioCreative Challenge Evaluation Workshop. 2007, 23: 101-103.
  • 6McDonald R, Pereira F. Identifying gene and protein mentions in text using conditional random fields[J]. BMC bioinformatics, 2005, 6 (Suppl 1) :S6.
  • 7Collins M. Discriminative training methods for hidden markov models: Theory and experiments with perceptron algorithms[C]//Proceed ings of the ACL-02 conference on Empirical methods in natural language processing-Volume 10. Association for Computational Linguis tics, 2002: 1-8.
  • 8Collins M, Dully N. New ranking algorithms for parsing and tagging: Kernels over discrete structures, and the voted perceptron[C]// Proceedings of the 40th annual meeting on association for computational linguistics. Association for Computational Linguistics, 2002:263 -270.
  • 9Ratnaparkhi A. A maximum entropy model for part-of-speech tagging[C]//Proceedings of the conference on empirical methods in natural language processing. 1996, 1: 133-142.

二级参考文献60

  • 1王浩畅,赵铁军,刘延力,于浩.生物医学文本中命名实体识别的智能化方法[J].北京邮电大学学报,2006,29(z2):54-58. 被引量:2
  • 2王浩畅,赵铁军.基于SVM的生物医学命名实体的识别[J].哈尔滨工程大学学报,2006,27(B07):570-574. 被引量:18
  • 3Cohen, A. M. , W. R. Hersh. A survey of current work in biomedical text mining,[J]. Briefings in Bioinformatics, 2005, 6(1): 57-71.
  • 4Wang, Sammy. Application of Data and Text Mining to Bioinformatics [EB/OL]. http: //cs. uga. edu/-zhiming/datamining/TM, ppt.
  • 5Ananiadou, Sophia, Kell, D. B. Tsujii, Jun-ichi. Text mining and its potential applications in systems biology [J]. Trends in Biotechnology. 2006, 24(12): 571-579.
  • 6Polajnar, T. Survey of Text Mining of Biomedical Corpora [EB/OL]. http: //www. brc. dcs. gla. ac. uk/ lamara,/surveyoftm, pdt.
  • 7Kazama, Jun'ichi, Takaki Makino, et al. 2002. Tuning support vector machines for biomedical named entity recognition [A]. In: Proc. of ACL-02 Workshop on Natural Language Processing in the Biomedical Domain [C]. 2002. 18.
  • 8Lee, Ki-Joong, Young-Sook Hwang, et al. 2003. Two-Phase Biomedical NE Recognition based on SVMs [A]. In: Proc. of ACL-03 Workshop on Natural Language Processing in the Biomedical Domain [C]. 2003. 33-40.
  • 9Tanabe, L. , Wilbur, W. J. Tagging gene and protein names in biomedical text [J]. Bioinformatics, 18(8): 1124-1132.
  • 10Chang J T, Schutze H, Altman R B. GAPSCORE: Finding gene and protein names one word at a time[J]. Bioinformatics, 2004, 20(2): 216 225.

共引文献22

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部