摘要
根据生物医学文本中基因名的特点,提出了一组新特征用于基因名的识别。利用精简的特征集,将提出的新特征融合进精简特征集中。应用Global Linear模型和感知机学习算法在BioCreativeⅡ数据集中对提出的方法进行了验证,结果表明,通过使用数量较少的、区分能力强的特征,仍能使系统达到较高的性能。当融合新特征时,系统的精确率和召回率也有一定的提高。
Based on the features in biomedical text, a new feature method was proposed to recognize gene names. A reductive feature set combined with some new features was employed in the form of gene lexi- cons, applying the method on BioCreative Ⅱ shared dataset with global linear framework and perceptron learning algorithm. Results of the experiment show that in the case of reductive and strong classification features, the system still obtain high performance. When incorporate new features, the precision and recall continue improved to some extent.
出处
《青岛大学学报(自然科学版)》
CAS
2014年第2期61-64,89,共5页
Journal of Qingdao University(Natural Science Edition)
基金
国家自然基金重大研究计划培育项目(批准号:91130035)资助
山东省自然科学基金重点项目(批准号:ZR2012FZ003)资助
山东省自然科学基金青年基金(批准号:ZR2012FQ017)资助
关键词
基因名识别
精简特征集
权值向量
学习算法
gene name recognition
reductive feature
weight vector
learning algorithm