摘要
【目的/意义】针对医疗问答社区数据量大、规范性差、数据稀疏等特性,综合利用双向长短记忆神经网络(BiLSTM)、条件随机场(CRF)、双向门控循环单元(BiGRU)等深度学习模型,对社区文本的实体识别及关系抽取方法进行研究。【方法/过程】首先,对实体作了进一步细分,利用BiLSTM-CRF模型对BIO标注的数据集进行实体识别,实验发现细分实体比未细分实体在结果上表现更好;接着利用BiGRU-Attention模型抽取各实体间的关系,实验结果显示,该模型无论是在准确率、召回率还是F值上都比BiLSTM-Attention抽取模型有较大的提升;最后利用Neo4j图数据库构建了一个可视化的知识图谱。【结果/结论】本研究将非结构化的社区文本转化为结构化数据,在医疗社区的智能知识服务、知识表示、个性化知识推荐等方面具有推动作用。【创新/局限】在医疗实体识别过程中将实体进行细分,成功构建了基于在线医疗社区问答文本的乳腺癌知识图谱。但由于某些关系样本量较少,对整体关系抽取的评价指标存在一定的影响。
【Purpose/significance】This paper studies the Knowledge Graph construction method of the medical question and answer community. Aiming at the large amount of data, poor standardization and sparse data of the question-and-answer community, this paper comprehensively uses the bidirectional long-term memory neural network, conditional random field, bidirectional gated recurrent unit and other models to study the Entity Recognition and Relation Extraction methods of community text.【Method/process】Firstly,the entity is further subdivided. The bidirectional long-term memory neural network and the conditional random field model(BiLSTM-CRF) are used to identify the data set of the BIO. The experiment finds that the segmented entity performs better than the un-subdivided entity. Then the relationship between the entities is extracted by the bidirectional gated recurrent unit and the attention mechanism model(BiGRU-Attention).【Result/conclusion】The experimental results show that the model has a greater improvement than the BiLSTM-Attention extraction model in terms of accuracy, recall rate and F value. Finally, a visual Knowledge Graph was constructed using the Neo4 j graph database. This research transforms unstructured community texts into structured data, which promotes intelligent knowledge services, knowledge representation, and personalized knowledge recommendation in the medical community.【Innovation/limitation】In the process of medical entity recognition, entities are subdivided, and a breast cancer Knowledge Graph based on the text of online medical community question and answer is successfully constructed. However, due to the small sample size of some relationships, there is a certain impact on the evaluation indicators of the overall relationship extraction.
作者
廖开际
黄琼影
席运江
LIAO Kai—ji;HUANG Qiong-ying;XI Yun—jiang(School of Business Administration,South China University of Technology,Guangzhou 510641,China)
出处
《情报科学》
CSSCI
北大核心
2021年第3期51-59,75,共10页
Information Science
基金
国家自然科学基金项目“基于超网络的企业微博知识挖掘及整合方法研究”(71371077)。