结合受控词汇表的生物基因本体标注与分类被引量：3

Triage and Annotation of Biological Gene's Ontology Combined Controlled Glossary

下载PDF

导出

摘要通过研究有关基因的生物学文献特征,提出了一种能对生物基因文献进行自动标注与分类的方法.在K最邻近算法的基础上,采用了Chi-Square特征选择方案,并且在加权算法中突出了Chi-Square的选择特点.另外,采用文档逻辑分块法,将额外的生物受控词汇表中的信息所形成的向量直接引入到了分类算法中,以提高分类和标注的效果.实验表明,所提算法优于常用的单词频率/逆文档频率加权方法,其在文本检索大会(TREC)数据集上的分类、标注效果分别比TREC公布的最好结果提高了3.14%和4.12%. Based on the K nearest neighbor algorithm, an improved method was proposed for selecting genes-related documents from biology literature, and then automatically annotating and classifying. The method employs the Chi-Square feature selection plan and highlights the Chi-Square selections in weighted calculations. Furthermore, the effect of classification and annotation was improved by dividing the documents into logical blocks and introducing additional vectors from biological resources MeSH into the classification algorithm directly. Experiment results show that the proposed method is better than the commonly used TFIDF （term frequency and inverse document frequency） weighting method, and the results tested on TREC （text retrieval conference） data sets are 3.14% higher in classification and 4. 13% higher in annotation comparing to the best results announced TREC.

作者崔舒宁朱丹军冯博琴昂正全

机构地区西安交通大学电子与信息工程学院

出处《西安交通大学学报》 EI CAS CSCD 北大核心 2008年第2期171-174,共4页 Journal of Xi'an Jiaotong University

基金陕西省自然科学基金资助项目(2004F06) "九八五"二期平台建设资助项目

关键词基因本体分类标注最邻近算法 gene ontology classification annotation nearest neighbor algorithm

分类号 TP319 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献7

1DIETRICH REBHOLZ-SCHUHMANNH K, COUTO Fo Facts from text -- is text mining ready to deliver ? [J], PLoS Biology, 2005, 3(2) :188-191.
2ADITYA V P B, KINCAID R. An architecture for biological information extraction and representation [J]. Bioinformatics, 2005, 21(4):430-438.
3NARAYANASWAMY M, RAVIKUMAR K E. A biological named entity recognizer [C] //Proceedings of Pacific Symposium on Biocomputing. Hawaii, USA: World Scientific, 2003 : 427-438.
4TSUJI J. Boosting precision and recall of dictionarybased protein name recognition [C] // Proceedings of Atomic Level Characterizations. Hawaii, USA: Wiley Publisher, 2003:41-48.
5ZHOU Guodong, SU Jian. Named entity recognition using an HMM-based chunk tagger [C]//Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. San Francisco, USA: Morgan Kaufmann Publishers, 2002 : 473-480.
6YANG Yiming, PEDERSEN J O. A comparative study on feature selection in text categorization [C] // Proceedings of the 14th International Conference on Machine Learning. San Francisco, USA: Morgan Kaufmarm Publishers, 1997:412-420.
7ROSENFELD R. A maximum entropy approach to adaptive statistical language modeling [J]. Computer, Speech, and Language, 1996(10) : 187-228.

同被引文献25

1赵志球,谢叻,王丹,裴国献,庄小龙.股骨远端骨折复位虚拟手术的研究[J].系统仿真学报,2009,21(S1):242-244. 被引量：3
2LAFFERTY J, MCCALLUM A, PEREIRA F. Conditional random fields: probabilistic models for segmenting and labeling sequence data[C]//Proceedings of the 18th International Conference on Machine Learning. Williamstown, Massachusetts, USA: Morgan Kaufmann, 2001 .. 282-289.
3KINJO AR, ROSSELLO F , VALIENTE G. Profile conditional random fields for modeling protein families with structural information [J]. Biophysics, 2009,5: 37-44.
4SETTLES B. Biomedical named entity recognition using conditional random fields and novel feature sets [C] // Proceedings of the Joint Workshop on Natural Language Processing in Biomedicine and Its Applications. New Jersey, USA: Association for Computational Linguistics, 2004:104-107.
5BUNDSCHUS M, DEJORI M, STETTER M, et al. Extraction of semantic biomedical relations from text using conditional random fields [J]. BMC Bioinformatics, 2008,9: 207-220.
6KIM J D, OHTA T, TATEISI Y, et al. GENIA corpus: a semantically annotated corpus for bio-textmining [J]. Bioinformatics, 2003,19(S1) : i180-i182.
7TANABE L, XIE N, THOM L H, et al. Genetag: a tagged corpus for gene/protein named entity recognition [J]. BMC bioinformatics, 2005,6(S1): 1-7.
8KENNEDY J, EBERHART R. Particle swarm optimization[C]// Proceedings of the 14th International Conference on Neural Networks. Piscataway, NJ, USA: IEEE Service Center, 1995: 1942-1948.
9EBERHART R, KENNEDY J . A new optimizer using particle swarm theory [C] // Proceedings of the 6th International Symposium on Micro Machine and Human Science. Piscataway, NJ, USA: IEEE, 1995:39-43.
10YANG Guangyou. A modified particle swarm optimizer algorithm [C]//Proceedings of the 8th Internation al Conference on Electronic Measurement and Instru ments. Piscataway, NJ, USA.. IEEE, 2007:2675 -2679.

引证文献3

1豆增发,高琳.应用粒子群优化-条件随机域的文本生物实体识别[J].西安交通大学学报,2010,44(12):38-42. 被引量：2
2郑丽萍,李光耀,姜华.口腔颌面疾病辅助诊断系统的设计与实现[J].计算机工程,2011,37(21):279-281. 被引量：2
3张俐,王枞,郭文明.利用近似马尔科夫毯的最大相关最小冗余特征选择算法[J].西安交通大学学报,2018,52(10):141-145. 被引量：13

二级引证文献17

1毕占岁,蔡小芳.基于本体的疾病辅助诊断系统的研究与实现[J].数字技术与应用,2014,32(2):78-79.
2田少杰,洪跃,李阳.基于模糊综合评价的健康评估系统开发[J].计算机工程与科学,2014,36(4):685-689. 被引量：7
3孙晓,孙重远,任福继.基于深层条件随机场的生物医学命名实体识别[J].模式识别与人工智能,2016,29(11):997-1008. 被引量：18
4于楠,王普,翁壮,方丽英.基于多特征融合的中文电子病历命名实体识别[J].北京生物医学工程,2018,37(3):279-284. 被引量：13
5李郅琴,杜建强,聂斌,熊旺平,黄灿奕,李欢.特征选择方法综述[J].计算机工程与应用,2019,55(24):10-19. 被引量：122
6杜秀杰.编辑在处理稿件过程中的界限意识[J].编辑学报,2019,31(6):683-684. 被引量：2
7何为,唐智和,吴甭,栾辉,张晶晶,陈冲,梁华庆.基于LSTM的催化裂化装置NOx排放预测模型及应用[J].西安石油大学学报（自然科学版）,2020,35(4):108-113. 被引量：7
8盖晓平,王冬青,赵喜兰,高峰,林昌年.利用概率统计特性的保护告警信息特征降维方法[J].电网技术,2021,45(5):2017-2024. 被引量：6
9金秀章,李京.基于互信息PSO-LSSVM的SO_(2)浓度预测[J].计量学报,2021,42(5):675-680. 被引量：9
10庞玉林,李喜旺.基于SU和AMB的网络流量特征选择算法[J].计算机系统应用,2022,31(4):281-287. 被引量：1

1秦亚辉,何利力.基于分块后重叠K-means聚类的KNN分类算法[J].工业控制计算机,2017,30(2):103-104. 被引量：1
2王增民,王开珏.基于熵权的K最临近算法改进[J].计算机工程与应用,2009,45(30):129-131. 被引量：18
3李美航,黄英仁.双聚类算法研究[J].广东科技,2016,25(2):48-49.
4刘卓.K-最邻近算法在文本自动分类中的应用[J].苏州市职业大学学报,2010,21(2):58-60.
5贾立双,李静.基于一种改进算法的单车场多车型车辆调度研究[J].中国制造业信息化（学术版）,2008,37(10):8-11. 被引量：6
6蒲兴成,孙凯.一种改进的自适应蚁群算法及其应用研究[J].重庆邮电大学学报（自然科学版）,2011,23(3):331-335. 被引量：6
7王亚利,李晓静.一种基于SVM的Web信息自动化抽取方法[J].东莞理工学院学报,2012,19(5):53-57.
8黄莉,李湘东.两种相似度计算方法对KNN分类效果的影响研究[J].情报杂志,2012,31(7):177-181. 被引量：5
9郗华,朱春燕.基于MATLAB的运动目标检测和识别技术研究[J].价值工程,2013,32(18):203-204. 被引量：3
10徐晟逸,苏平,邓晖飞.分步求解切割路径的优化算法研究[J].机电工程技术,2014,43(9):81-84. 被引量：2

西安交通大学学报

2008年第2期

浏览历史

内容加载中请稍等...

结合受控词汇表的生物基因本体标注与分类被引量：3

参考文献7

同被引文献25

引证文献3

二级引证文献17

相关作者

相关机构

相关主题

浏览历史

结合受控词汇表的生物基因本体标注与分类 被引量：3

参考文献7

同被引文献25

引证文献3

二级引证文献17

相关作者

相关机构

相关主题

浏览历史

结合受控词汇表的生物基因本体标注与分类被引量：3