期刊文献+

结合受控词汇表的生物基因本体标注与分类 被引量:3

Triage and Annotation of Biological Gene's Ontology Combined Controlled Glossary
下载PDF
导出
摘要 通过研究有关基因的生物学文献特征,提出了一种能对生物基因文献进行自动标注与分类的方法.在K最邻近算法的基础上,采用了Chi-Square特征选择方案,并且在加权算法中突出了Chi-Square的选择特点.另外,采用文档逻辑分块法,将额外的生物受控词汇表中的信息所形成的向量直接引入到了分类算法中,以提高分类和标注的效果.实验表明,所提算法优于常用的单词频率/逆文档频率加权方法,其在文本检索大会(TREC)数据集上的分类、标注效果分别比TREC公布的最好结果提高了3.14%和4.12%. Based on the K nearest neighbor algorithm, an improved method was proposed for selecting genes-related documents from biology literature, and then automatically annotating and classifying. The method employs the Chi-Square feature selection plan and highlights the Chi-Square selections in weighted calculations. Furthermore, the effect of classification and annotation was improved by dividing the documents into logical blocks and introducing additional vectors from biological resources MeSH into the classification algorithm directly. Experiment results show that the proposed method is better than the commonly used TFIDF (term frequency and inverse document frequency) weighting method, and the results tested on TREC (text retrieval conference) data sets are 3.14% higher in classification and 4. 13% higher in annotation comparing to the best results announced TREC.
出处 《西安交通大学学报》 EI CAS CSCD 北大核心 2008年第2期171-174,共4页 Journal of Xi'an Jiaotong University
基金 陕西省自然科学基金资助项目(2004F06) "九八五"二期平台建设资助项目
关键词 基因本体 分类标注 最邻近算法 gene ontology classification annotation nearest neighbor algorithm
  • 相关文献

参考文献7

  • 1DIETRICH REBHOLZ-SCHUHMANNH K, COUTO Fo Facts from text -- is text mining ready to deliver ? [J], PLoS Biology, 2005, 3(2) :188-191.
  • 2ADITYA V P B, KINCAID R. An architecture for biological information extraction and representation [J]. Bioinformatics, 2005, 21(4):430-438.
  • 3NARAYANASWAMY M, RAVIKUMAR K E. A biological named entity recognizer [C] //Proceedings of Pacific Symposium on Biocomputing. Hawaii, USA: World Scientific, 2003 : 427-438.
  • 4TSUJI J. Boosting precision and recall of dictionarybased protein name recognition [C] // Proceedings of Atomic Level Characterizations. Hawaii, USA: Wiley Publisher, 2003:41-48.
  • 5ZHOU Guodong, SU Jian. Named entity recognition using an HMM-based chunk tagger [C]//Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. San Francisco, USA: Morgan Kaufmann Publishers, 2002 : 473-480.
  • 6YANG Yiming, PEDERSEN J O. A comparative study on feature selection in text categorization [C] // Proceedings of the 14th International Conference on Machine Learning. San Francisco, USA: Morgan Kaufmarm Publishers, 1997:412-420.
  • 7ROSENFELD R. A maximum entropy approach to adaptive statistical language modeling [J]. Computer, Speech, and Language, 1996(10) : 187-228.

同被引文献25

  • 1赵志球,谢叻,王丹,裴国献,庄小龙.股骨远端骨折复位虚拟手术的研究[J].系统仿真学报,2009,21(S1):242-244. 被引量:3
  • 2LAFFERTY J, MCCALLUM A, PEREIRA F. Conditional random fields: probabilistic models for segmenting and labeling sequence data[C]//Proceedings of the 18th International Conference on Machine Learning. Williamstown, Massachusetts, USA: Morgan Kaufmann, 2001 .. 282-289.
  • 3KINJO AR, ROSSELLO F , VALIENTE G. Profile conditional random fields for modeling protein families with structural information [J]. Biophysics, 2009,5: 37-44.
  • 4SETTLES B. Biomedical named entity recognition using conditional random fields and novel feature sets [C] // Proceedings of the Joint Workshop on Natural Language Processing in Biomedicine and Its Applications. New Jersey, USA: Association for Computational Linguistics, 2004:104-107.
  • 5BUNDSCHUS M, DEJORI M, STETTER M, et al. Extraction of semantic biomedical relations from text using conditional random fields [J]. BMC Bioinformatics, 2008,9: 207-220.
  • 6KIM J D, OHTA T, TATEISI Y, et al. GENIA corpus: a semantically annotated corpus for bio-textmining [J]. Bioinformatics, 2003,19(S1) : i180-i182.
  • 7TANABE L, XIE N, THOM L H, et al. Genetag: a tagged corpus for gene/protein named entity recognition [J]. BMC bioinformatics, 2005,6(S1): 1-7.
  • 8KENNEDY J, EBERHART R. Particle swarm optimization[C]// Proceedings of the 14th International Conference on Neural Networks. Piscataway, NJ, USA: IEEE Service Center, 1995: 1942-1948.
  • 9EBERHART R, KENNEDY J . A new optimizer using particle swarm theory [C] // Proceedings of the 6th International Symposium on Micro Machine and Human Science. Piscataway, NJ, USA: IEEE, 1995:39-43.
  • 10YANG Guangyou. A modified particle swarm optimizer algorithm [C]//Proceedings of the 8th Internation al Conference on Electronic Measurement and Instru ments. Piscataway, NJ, USA.. IEEE, 2007:2675 -2679.

引证文献3

二级引证文献17

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部