期刊文献+

基于任务知识融合与文本数据增强的医学信息查询意图强度识别研究 被引量:8

Recognizing Intensity of Medical Query Intentions Based on Task Knowledge Fusion and Text Data Enhancement
原文传递
导出
摘要 【目的】为提高医学信息查询意图强度识别的精度并解决查询式词向量表征困难、标注数据集少等问题,设计一种基于任务知识融合与文本数据增强的医学信息查询意图强度识别方法。【方法】在文本数据增强方面,选取SimBERT模型,实现小样本数据集的文本数据增强;在文本表示方面,利用医学信息查询式文本语料对BERT模型进行增量预训练,获得融合任务知识的MQ-BERT模型;在文本分类方面,在MQ-BERT后引入Bi-LSTM等模型进行分类任务,并对比文本数据增强前后的分类效果。【结果】融合任务知识的MQBERT的分类结果F-Score达到92.22%,超越了阿里巴巴团队提出的MC-BERT在同一任务数据集上的最佳结果(F-Score=87.5%);文本数据增强后,模型分类效果进一步提升,其中基于MQ-BERT和Bi-LSTM的模型获得最佳分类结果,F-Score为95.34%,相比MC-BERT提升了7.84个百分点。【局限】增量预训练过程的数据选择方法在未来可以进一步优化。【结论】任务知识融合与文本数据增强能有效提高医学信息查询意图强度识别精度,针对不同强度的查询意图,应该对其查询结果采用不同的呈现方式,以提升医学信息检索系统的查询准确度,更好地满足用户的医学信息需求。 [Objective]This paper proposes a recognition model for the intensity of medical query intentions based on task knowledge fusion and text enhancement,aiming to improve the representation of query word vectors,as well as expand labeled data sets.[Methods]First,we used the SimBERT model to realize the text data enhancement of small task data set.Then,we utilized the medical query text corpus to incrementally pre-train the BERT model and obtain the MQ-BERT(Medical-Query BERT)model with task knowledge.Finally,we introduced the Bi-LSTM and other models to compare the classification performance before and after text data enhancement.[Results]The F-Score of our new MQ-BERT model reached 92.22%,which is superior than the existing models by Alibaba team on the same task data set(F-Score=87.5%).With the text data enhancement,the classification performance of our new model was also improved(F-Score=95.34%),which is 7.84%higher than the MC-BERT one.[Limitations]The data selection of incremental pre-training process could be further optimized.[Conclusions]Task knowledge fusion and text data enhancement can effectively improve the recognition accuracy of the intensity of medical query intentions,which benefits the developments of medical information retrieval system.
作者 赵一鸣 潘沛 毛进 Zhao Yiming;Pan Pei;Mao Jin(Center for Studies of Information Resources,Wuhan University,Wuhan 430072,China;School of Information Management,Wuhan University,Wuhan 430072,China;Big Data Institute,Wuhan University,Wuhan 430072,China;National Demonstration Center for Experimental Library and Information Science Education,Wuhan University,Wuhan 430072,China)
出处 《数据分析与知识发现》 CSSCI CSCD 北大核心 2023年第2期38-47,共10页 Data Analysis and Knowledge Discovery
基金 国家自然科学基金项目(项目编号:71874130,72274146) 教育部人文社会科学研究项目(项目编号:18YJC870026)的研究成果之一。
关键词 医学信息查询 意图强度识别 文本数据增强 任务知识融合 BERT模型 Medical Information Query Intention Intensity Recognition Text Data Enhancement Task Knowledge Fusion BERT Model
  • 相关文献

参考文献7

二级参考文献121

  • 1余本功,曹雨蒙,陈杨楠,杨颖.基于nLD-SVM-RF的短文本分类研究[J].数据分析与知识发现,2020,4(1):111-120. 被引量:10
  • 2田雨,张旭东,顾瑞珍.解读《国家突发公共事件总体应急预案》[J].环境经济,2006(1):29-31. 被引量:6
  • 3Kang I,Kim G. Query type classification for Web document retrieval[ C ]//Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2003:64-71.
  • 4Broder A. Ataxonomy of Web search[J]. SIGIR Forum, 2002, 36(2) : 3 -10.
  • 5Rose D E,Levinson D. Understanding user goals in Web search [C]//WWW 2004: Proceedings of the 13th International Conference on World Wide Web, 2004:13 - 19.
  • 6Marchionini G. Exploratorysearch: From finding to understanding[J]. Communications of the ACM, 2006, 49(4) : 41-46.
  • 7Lee U,Liu Z, Cho J. Automatic identification of user goals in Web search [ C ]//WWW 2005 : Proceedings of the 14th International Conference on World Wide Web, 2005:391-401.
  • 8Mendoza M,Ricardo Baeza-Yates. A Web search analysis considering the intention behind queries[ C ]//LA-WEB 20-: Proceedings of the Latin American Web Conference, 2008:66-74.
  • 9Waller V. Not just information: Who searches for what on the search engine Google?[ J ]. Journal of the American Society for Information Science and Technology, 2011,62(4) : 761 -775.
  • 10Lux M,Kofler C,Marques O. A classification scheme for user intentions in image search [ C ]//Proceedings of the 28th International Conference Extended Abstracts on Human Factors in Computing Systems, 2010:3913 -3918.

共引文献105

同被引文献72

引证文献8

二级引证文献10

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部