摘要
近年来,基于机器学习方法的术语识别取得了不错的效果.然而,不同系统往往由于采用不同的学习方法或特征集而各有特点,他们在统计意义上性能接近的同时,在具体表现上却存在差异,那么,如何融合各个系统的不同特点和差异以求进一步提升术语识别的效果变得很有价值.针对该问题并结合这些系统自身的特点,提出一种基于分类策略的术语识别系统融合方法.该方法将融合问题看作一个二分类问题,同时,在分类器的设计过程中更加灵活和更多地整合了上下文信息和依存句法信息.在中文术语识别实验中的结果验证了该方法的有效性,融合后的结果好于每一个单系统.
In recent years, machine learning based term recognition has achieved satisfactory results. However, these systems usually a- dopt different feature sets and therefore show quite distinct strengths. While they tend to have quite similar capacity in statistical sense, they are showing differently at specific performance. Then it will be very worthwhile to combine the strengths of each system to further improve the term recognition result. For this question, this paper proposes a classification strategy based terminology recognition sys- tems combination method. This method seen the combination problem as a binary classification problem, and during the construction of the combination classifier, more context information and syntactic information are effectively adopted, and also the adaption of different features becomes more flexible. Experiment results on Chinese terminology recognition have demonstrated the effectiveness of this method. The result after combination is better than that of each single system.
出处
《小型微型计算机系统》
CSCD
北大核心
2015年第2期385-390,共6页
Journal of Chinese Computer Systems
基金
国家科技支撑计划项目(2012BAH14F00)资助
辽宁省教育厅科学研究一般项目(L2012056)资助
关键词
术语识别
系统融合
分类策略
机器学习
term recognition
system combination
classification strategy
machine learning