摘要
[目的/意义]为有效抽取在线医疗社区问答文本中包含的医疗知识,综合利用多种深度学习方法,有针对性地设计一套知识图谱构建方法,以应对其口语化、噪声多、规范性差的文本特性给知识抽取带来的巨大挑战。[方法/过程]以寻医问药网糖尿病相关问答文本为数据源,结合对社区用户健康需求的分析,定义适合社区文本的实体和关系类型。使用BERT-wwm进行词嵌入以解决一词多义问题,通过BiLSTM-CRF模型进行实体识别。在关系标注时,设计一种实体遮蔽(entity mask)方式以解决关系重叠问题,而后使用CNN-Attention模型进行关系抽取。最后综合使用词典匹配和实体名称相似度进行实体对齐,并使用Neo4j图数据库存储和可视化得到的糖尿病知识图谱。[结果/结论]实验结果显示上述方法能够大幅提升对在线医疗社区问答文本的知识抽取效果,有效将非结构化的社区医疗问答文本转化为结构化的数据,对于社区知识发现、在线智能健康服务等方面具有推动作用。
[Purpose/Significance]This paper designs a set of knowledge graph construction method with some deep learning methods to facilitate knowledge extraction from colloquial,noisy and poorly normalized online medical community Q&A texts.[Method/Process]This paper utilized diabetes-related Q&A texts from xywy.com as the dataset,and determined entity and relationship categories through an analysis of the healthcare needs of the community users.The BERT-wwm model was employed for word embedding to solve polysemy,and then the BiLSTM-CRF model for entity recognition.When annotating the relations between entities,an entity mask was designed to avoid the relation overlap,and the CNN-Attention model was adopted for relation extraction.Ultimately,structured data was obtained through entity alignment using dictionary matching and entity name similarity,and stored and visualized using Neo4j.[Result/Conclusion]Experiments verify the effectiveness of the above methods.This paper extracts the medical knowledge from non-structured OMC text into structured data,which can promote the community knowledge discovery and online intelligent health services.
作者
席运江
李曼
邓雨珊
廖晓
邝云英
Xi Yunjiang;Li Man;Deng Yushan;Liao Xiao;Kuang Yunying(School of Business Administration,South China University of Technology,Guangzhou 510641;School of Internet Finance and Information Engineering,Guangdong University of Finance,Guangzhou 510521;School of Management,Guangzhou City University of Technology,Guangzhou 510800;School of Information Engineering,Guangzhou Vocational University of Science and Technology,Guangzhou 510550)
出处
《图书情报工作》
CSSCI
北大核心
2024年第4期124-136,共13页
Library and Information Service
基金
国家自然科学基金项目“虚拟健康社区信息可信度评价模型及智能推荐研究”(项目编号:72171090)
广东省基础与应用基础研究基金自然科学基金项目“基于超网络建模的用户创新社区知识价值评价模型及方法研究”(项目编号:2023A1515011551)研究成果之一。