摘要
市场信息化使得商务信息抽取、市场内容管理日益成为信息科学领域的一个研究热点。产品命名实体识别作为其中非常重要的关键技术之一也逐渐受到人们的关注。本文面向商务信息抽取对产品命名实体进行了定义并系统分析了其识别任务的特点和难点,提出了一种基于层级隐马尔可夫模型(hierarchical hid-den Markov model)的产品命名实体识别方法,实现了汉语自由文本中产品命名实体识别和标注的原型系统。实验表明,该系统在电子数码和手机领域均取得了令人满意的实验结果,对产品名实体、产品型号实体、产品品牌实体整体识别性能的F值分别为79.7%,86.9%,75.8%。通过和最大熵模型相比较,验证了HHMM对于处理多尺度嵌套序列有更强的表征能力。
Electronic business has fueled increasing research interest recently in business information extraction and market intelligence management. As one of the key techniques, product named entity recognition ( product NER) has also begun to draw more attention in the field of natural language processing. In the paper, characteristics and challenges in product NER are explored and analyzed deliberately, and a hierarchical hidden Markov model (HHMM) based approach to product NER from Chinese free text is presented. Experimental results in both digital and mobile phone domains show that our approach performs quite well in these two different domains and achieves F-measures of 79.7%, 86.9%, 75.8% on the whole for three types of product named entities respectively. In comparison with maximum entropy model, HHMM is experimentally proved to be more powerful for dealing with multi-scale embedded sequence problem.
出处
《中文信息学报》
CSCD
北大核心
2006年第1期7-13,共7页
Journal of Chinese Information Processing
基金
国家自然科学基金资助项目(60372016)
北京市自然科学基金资助项目(4052027)
关键词
计算机应用
中文信息处理
产品命名实体识别
商务信息抽取
层级隐马尔可夫模型
computer application
Chinese information processing
product named entity recognition
business information extraction
hierarchical hidden Markov model(HHMM)