期刊文献+

基于混合策略的公众健康领域新词识别方法研究 被引量:2

The Study on Consumer Health New Term Identification Based on Hybrid Method
原文传递
导出
摘要 [目的 /意义]从互联网公众查询数据中发现公众使用的健康术语,为建立公众健康术语与医学专业术语的映射提供基础,进而优化健康类知识服务平台的知识组织与管理性能。[方法 /过程]设计规则与NGram相结合的健康术语新词的识别模型,采集公众查询数据,开展实验验证,通过多次实验,逐步完善过滤语料集合,结合人工判读,不断优化并验证方案的有效性。[结果 /结论]从互联网中公众提问句抽取出规则,结合统计算法进行公众使用的健康类新词抽取,该技术方法对识别公众使用的健康术语具有一定的通用性,能为建立公众术语与医学术语映射提供数据基础。实验结果表明:基于规则进行公众日志数据预处理,能为后续的实验方案提供较好的预处理文本,而采用N-Gram及各种过滤规则结合的术语识别方法,能较好地识别发现短文本中的新词。 [Purpose / significance]Identify the health term by consumer understanding from Web query data,to provide fundamental term set for carrying out the mapping between the consumer-friendly terms and the professionals in medical domain. [Method / process]The consumer health term identification model is set up combining N-Gram and rule,and the Web query data is captured from consumers. Using these data as samples,implement experiment,the rationality of the model is verified by expert reviewing. [Result / conclusion]The method of new term identified in this paper is extracting rules from consumers' question data in Web query dataset,and combining statistical methods. The identified model in this paper has better identification capability,which can provide significant dataset for mapping the lay terms between the professionals in consumer health domain. The experimental results show that it can provide preprocessing text for follow-up experiment by processing the public Web data based on rules,the identified model of combining N-Gram and rules can identify new health terms in short text,and the model is reasonable and scientific.
出处 《图书情报工作》 CSSCI 北大核心 2015年第23期115-123,共9页 Library and Information Service
基金 国家社会科学基金"面向知识服务的公众健康知识组织体系构建研究"(项目编号:14BTQ032) "十二五"国家科技支撑计划课题"公众健康知识整合与服务技术研究与应用"(项目编号:2013BAI06B01)研究成果之一
关键词 互联网查询数据 公众健康术语 N-GRAM 实体识别 Web query data consumer health term N-Gram entity identification
  • 相关文献

参考文献24

  • 1国家卫生和计划生育委员会宣传司,中国健康教育中心.2013年中国居民健康素养监测报告[EB/OL]. Availablefrom: http://www. sdwsjs. gov. cn/xwzx/mtgz/201412/P020141217534556223215. pdf. [Accessed on 26th Decem-ber 2014].
  • 2第八次中国公民科学素养调查结果发布[EB/OL].http://www.east.org.cn/n35081/n35518/12451858.html.2010-11-25.
  • 3中东呼吸综合征:韩国疫情蔓延[EB/OL].[2015-07-13].http ://world. people, com. cn/n/2015/0611/e1002 -27141708. html.
  • 4Miller T, Leroy G, Wood E. Dynamic generation of a table of con- tents with consumer-friendly labels [ EB/OL]. [ 2015 - 10 - 05]. http ://www. ncbi. nlm. nih. gov/pmc/articles/PMC1839557.
  • 5Qing Z T, Tse T, Crowell J. Identifying consumer-friendly display (CFD) names for health concepts [ EB/OL]. [ 2015 - 10 - 05 ]. http ://www. ncbi. nlm. nih. gov/pmc/articles/PMC1560732.
  • 6Zhang Shaodian, Elhadad N. Unsupervised biomedical named entity recognition:Experiments with clinical and biological texts [ J ]. Jour- nal Biomedical Information. 2013,46(6) :1 -29.
  • 7宗成庆.统计自然语言处理[M].北京:清华大学出版社,2014:11-12.
  • 8栗伟,赵大哲,李博,彭新茗,刘积仁.CRF与规则相结合的医学病历实体识别[J].计算机应用研究,2015,32(4):1082-1086. 被引量:40
  • 9郑家恒,李文花.基于构词法的网络新词自动识别初探[J].山西大学学报(自然科学版),2002,25(2):115-119. 被引量:56
  • 10穗志方.信息科学技术领域术语自动识别策略[C]//北京大学计算语言研究所.第二届中日自然语言处理专家研讨会论文集,北京:万方数据,2002:32-38.

二级参考文献124

共引文献222

同被引文献18

引证文献2

二级引证文献9

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部