摘要
命名实体识别是生物医学数据挖掘的基本任务.文章使用了基于支持向量机的方法对生物医学文本中的命名实体进行了识别,系统中结合了丰富的特征集,包括局部特征,全文特征和外部资源特征,对不同的特征和不同的特征组合对系统的贡献进行了评测和实验.为了进一步提高系统的性能,还引入了缩写词识别模块和过滤器模块.实验结果表明,该方法对生物医学文本中命名实体的识别取得到了较好的结果.
Name entity recognition is a fundamental task in biomedical data mining. This paper presents a Support Vector Machine-based method to identify name entity in biomedical texts. In this study, a rich set of features, including local features, whole text features, and external resource features are used. And different features and combinations of features are evaluated. Moreover, the abbreviation recognition module and filter module are introduced to improve performance of the system. The experimental results show that the system has better performance.
出处
《哈尔滨工程大学学报》
EI
CAS
CSCD
北大核心
2006年第B07期570-574,共5页
Journal of Harbin Engineering University
基金
国家863计划基金资助项目(2004AA117010-08).
关键词
命名实体识别
SVM
特征选择
缩写词
name entityrecognitiom SVM
feature selection
abbreviation