摘要
中文医疗领域问答容易受到医疗特定词汇的噪声影响,相对于开放领域问答其更具有挑战性。以往的中文医疗问答研究主要依赖于字符级别的细粒度信息,忽略了携带更多语义信息的单词级别的粗粒度信息。此外,引入外部医学知识图谱可以进一步丰富问答句子中的细粒度信息,然而目前大多数研究通常只采用句子和外部知识共同表示的简单方式。由此提出一种融合多粒度语义信息和知识图谱的中文医疗问答匹配模型(CMQA-MGSI)。该模型引入Lattice网络,结合Word2Vec和BERT设计了两种特征向量提取模型来选择问答句子中最相关的字符序列和单词序列以获得更丰富的多粒度语义信息;为了更好地融合外部领域知识,设计双通道注意力模块提取问答句子和知识图谱中实体嵌入以及关系嵌入之间多个角度的知识表征信息。该模型在数据集cMedQA1.0和cMedQA2.0上的实验表明,效果优于现有的问答匹配模型。
Chinese medical Q&A is easily affected by the noise of medical-specific terminology,making it more challenging than open-domain Q&A.Previous studies on Chinese medical Q&A mainly relied on character-level fine-grained information,neglecting word-level coarse-grained information that carries more semantic information.In addition,introducing external medical knowledge graph can further enrich the fine-grained information in Q&A sentences,but most existing studies usually adopt a simple way of joint representation of sentences and external knowledge.Therefore,this paper proposes a Chinese medical Q&A matching model based on multi-granularity semantic information and knowledge graph(CMQA-MGSI).The model employs a Lattice network to select the most relevant character-level and word-level sequences from the Q&A sentences,and leverages Word2Vec and BERT to enhance the semantic information;to better exploit the external domain knowledge,a dual-channel attention mechanism is devised to capture the multi-angle knowledge representations between the Q&A sentences and the entity embeddings and relation embeddings in the knowledge graph.Experiments on the cMedQA1.0 and cMedQA2.0 datasets demonstrate that the proposed model outperforms existing Chinese medical Q&A matching models.
作者
管立本
李实
GUAN Liben;LI Shi(College of Computer and Control Engineering,Northeast Forestry University,Harbin 150040,China)
出处
《计算机工程与应用》
CSCD
北大核心
2024年第14期152-161,共10页
Computer Engineering and Applications