Aiming at the relation linking task for question answering over knowledge base,especially the multi relation linking task for complex questions,a relation linking approach based on the multi-attention recurrent neural...Aiming at the relation linking task for question answering over knowledge base,especially the multi relation linking task for complex questions,a relation linking approach based on the multi-attention recurrent neural network(RNN)model is proposed,which works for both simple and complex questions.First,the vector representations of questions are learned by the bidirectional long short-term memory(Bi-LSTM)model at the word and character levels,and named entities in questions are labeled by the conditional random field(CRF)model.Candidate entities are generated based on a dictionary,the disambiguation of candidate entities is realized based on predefined rules,and named entities mentioned in questions are linked to entities in knowledge base.Next,questions are classified into simple or complex questions by the machine learning method.Starting from the identified entities,for simple questions,one-hop relations are collected in the knowledge base as candidate relations;for complex questions,two-hop relations are collected as candidates.Finally,the multi-attention Bi-LSTM model is used to encode questions and candidate relations,compare their similarity,and return the candidate relation with the highest similarity as the result of relation linking.It is worth noting that the Bi-LSTM model with one attentions is adopted for simple questions,and the Bi-LSTM model with two attentions is adopted for complex questions.The experimental results show that,based on the effective entity linking method,the Bi-LSTM model with the attention mechanism improves the relation linking effectiveness of both simple and complex questions,which outperforms the existing relation linking methods based on graph algorithm or linguistics understanding.展开更多
Question answering (QA) over knowledge base (KB) aims to provide a structured answer from a knowledge base to a natural language question. In this task, a key step is how to represent and understand the natural langua...Question answering (QA) over knowledge base (KB) aims to provide a structured answer from a knowledge base to a natural language question. In this task, a key step is how to represent and understand the natural language query. In this paper, we propose to use tree-structured neural networks constructed based on the constituency tree to model natural language queries. We identify an interesting observation in the constituency tree: different constituents have their own semantic characteristics and might be suitable to solve different subtasks in a QA system. Based on this point, we incorporate the type information as an auxiliary supervision signal to improve the QA performance. We call our approach type-aware QA. We jointly characterize both the answer and its answer type in a unified neural network model with the attention mechanism. Instead of simply using the root representation, we represent the query by combining the representations of different constituents using task-specific attention weights. Extensive experiments on public datasets have demonstrated the effectiveness of our proposed model. More specially, the learned attention weights are quite useful in understanding the query. The produced representations for intermediate nodes can be used for analyzing the effectiveness of components in a QA system.展开更多
COVID-19 evolves rapidly and an enormous number of people worldwide desire instant access to COVID-19 information such as the overview, clinic knowledge, vaccine, prevention measures, and COVID-19 mutation. Question a...COVID-19 evolves rapidly and an enormous number of people worldwide desire instant access to COVID-19 information such as the overview, clinic knowledge, vaccine, prevention measures, and COVID-19 mutation. Question answering(QA) has become the mainstream interaction way for users to consume the ever-growing information by posing natural language questions. Therefore, it is urgent and necessary to develop a QA system to offer consulting services all the time to relieve the stress of health services. In particular, people increasingly pay more attention to complex multi-hop questions rather than simple ones during the lasting pandemic, but the existing COVID-19 QA systems fail to meet their complex information needs. In this paper, we introduce a novel multi-hop QA system called COKG-QA, which reasons over multiple relations over large-scale COVID-19 Knowledge Graphs to return answers given a question. In the field of question answering over knowledge graph, current methods usually represent entities and schemas based on some knowledge embedding models and represent questions using pre-trained models. While it is convenient to represent different knowledge(i.e., entities and questions) based on specified embeddings, an issue raises that these separate representations come from heterogeneous vector spaces. We align question embeddings with knowledge embeddings in a common semantic space by a simple but effective embedding projection mechanism. Furthermore, we propose combining entity embeddings with their corresponding schema embeddings which served as important prior knowledge, to help search for the correct answer entity of specified types. In addition, we derive a large multi-hop Chinese COVID-19 dataset(called COKG-DATA for remembering) for COKG-QA based on the linked knowledge graph Open KG-COVID-19 launched by Open KG1, including comprehensive and representative information about COVID-19. COKG-QA achieves quite competitive performance in the 1-hop and 2-hop data while obtaining the best result with significant improvements in the 3-hop. And it is more efficient to be used in the QA system for users. Moreover, the user study shows that the system not only provides accurate and interpretable answers but also is easy to use and comes with smart tips and suggestions.展开更多
目前知识库问答(Knowledge base question answering,KBQA)技术无法有效地处理复杂问题,难以理解其中的复杂语义.将一个复杂问题先分解再整合,是解析复杂语义的有效方法.但是,在问题分解的过程中往往会出现实体判断错误或主题实体缺失...目前知识库问答(Knowledge base question answering,KBQA)技术无法有效地处理复杂问题,难以理解其中的复杂语义.将一个复杂问题先分解再整合,是解析复杂语义的有效方法.但是,在问题分解的过程中往往会出现实体判断错误或主题实体缺失的情况,导致分解得到的子问题与原始复杂问题并不匹配.针对上述问题,提出了一种融合事实文本的问解分解式语义解析方法.对复杂问题的处理分为分解-抽取-解析3个阶段,首先把复杂问题分解成简单子问题,然后抽取问句中的关键信息,最后生成结构化查询语句.同时,本文又构造了事实文本库,将三元组转化成用自然语言描述的句子,采用注意力机制获取更丰富的知识.在ComplexWebQuestions数据集上的实验表明,本文提出的模型在性能上优于其他基线模型.展开更多
Question answering systems offer a friendly interface for human beings to interact with massive online information. It is time consuming for users to retrieve useful medical information with search engines among massi...Question answering systems offer a friendly interface for human beings to interact with massive online information. It is time consuming for users to retrieve useful medical information with search engines among massive online websites. An effort is made to build a Chinese Question Answering System in Medical Domain(CQASMD) to provide useful medical information for users. A large medical knowledge base with more than 300 thousand medical terms and their descriptions is firstly constructed to store the structured medical knowledge data, and classified with the FastText model. Furthermore, a Word2Vec model is adopted to capture the semantic meanings of words, and the questions and answers are processed with sentence embedding to capture semantic context information. Users' questions are firstly classified and processed into a sentence vector and a matching algorithm is adopted to match the most similar question. After querying the constructed medical knowledge base, the corresponding answers to previous questions are responded to users. The architecture and flowchart of CQASMD is proposed, which will play an important role in self disease diagnosis and treatment.展开更多
基金The National Natural Science Foundation of China(No.61502095).
文摘Aiming at the relation linking task for question answering over knowledge base,especially the multi relation linking task for complex questions,a relation linking approach based on the multi-attention recurrent neural network(RNN)model is proposed,which works for both simple and complex questions.First,the vector representations of questions are learned by the bidirectional long short-term memory(Bi-LSTM)model at the word and character levels,and named entities in questions are labeled by the conditional random field(CRF)model.Candidate entities are generated based on a dictionary,the disambiguation of candidate entities is realized based on predefined rules,and named entities mentioned in questions are linked to entities in knowledge base.Next,questions are classified into simple or complex questions by the machine learning method.Starting from the identified entities,for simple questions,one-hop relations are collected in the knowledge base as candidate relations;for complex questions,two-hop relations are collected as candidates.Finally,the multi-attention Bi-LSTM model is used to encode questions and candidate relations,compare their similarity,and return the candidate relation with the highest similarity as the result of relation linking.It is worth noting that the Bi-LSTM model with one attentions is adopted for simple questions,and the Bi-LSTM model with two attentions is adopted for complex questions.The experimental results show that,based on the effective entity linking method,the Bi-LSTM model with the attention mechanism improves the relation linking effectiveness of both simple and complex questions,which outperforms the existing relation linking methods based on graph algorithm or linguistics understanding.
文摘Question answering (QA) over knowledge base (KB) aims to provide a structured answer from a knowledge base to a natural language question. In this task, a key step is how to represent and understand the natural language query. In this paper, we propose to use tree-structured neural networks constructed based on the constituency tree to model natural language queries. We identify an interesting observation in the constituency tree: different constituents have their own semantic characteristics and might be suitable to solve different subtasks in a QA system. Based on this point, we incorporate the type information as an auxiliary supervision signal to improve the QA performance. We call our approach type-aware QA. We jointly characterize both the answer and its answer type in a unified neural network model with the attention mechanism. Instead of simply using the root representation, we represent the query by combining the representations of different constituents using task-specific attention weights. Extensive experiments on public datasets have demonstrated the effectiveness of our proposed model. More specially, the learned attention weights are quite useful in understanding the query. The produced representations for intermediate nodes can be used for analyzing the effectiveness of components in a QA system.
基金supported by the Fundamental Research Funds for the Central Universities with grant Nos.22120220069the National Nature Science Foundation of China with Grant No.62176185supported in part by the Shanghai Artificial Intelligence Innovation and Development Fund grant 2020RGZN-02026
文摘COVID-19 evolves rapidly and an enormous number of people worldwide desire instant access to COVID-19 information such as the overview, clinic knowledge, vaccine, prevention measures, and COVID-19 mutation. Question answering(QA) has become the mainstream interaction way for users to consume the ever-growing information by posing natural language questions. Therefore, it is urgent and necessary to develop a QA system to offer consulting services all the time to relieve the stress of health services. In particular, people increasingly pay more attention to complex multi-hop questions rather than simple ones during the lasting pandemic, but the existing COVID-19 QA systems fail to meet their complex information needs. In this paper, we introduce a novel multi-hop QA system called COKG-QA, which reasons over multiple relations over large-scale COVID-19 Knowledge Graphs to return answers given a question. In the field of question answering over knowledge graph, current methods usually represent entities and schemas based on some knowledge embedding models and represent questions using pre-trained models. While it is convenient to represent different knowledge(i.e., entities and questions) based on specified embeddings, an issue raises that these separate representations come from heterogeneous vector spaces. We align question embeddings with knowledge embeddings in a common semantic space by a simple but effective embedding projection mechanism. Furthermore, we propose combining entity embeddings with their corresponding schema embeddings which served as important prior knowledge, to help search for the correct answer entity of specified types. In addition, we derive a large multi-hop Chinese COVID-19 dataset(called COKG-DATA for remembering) for COKG-QA based on the linked knowledge graph Open KG-COVID-19 launched by Open KG1, including comprehensive and representative information about COVID-19. COKG-QA achieves quite competitive performance in the 1-hop and 2-hop data while obtaining the best result with significant improvements in the 3-hop. And it is more efficient to be used in the QA system for users. Moreover, the user study shows that the system not only provides accurate and interpretable answers but also is easy to use and comes with smart tips and suggestions.
文摘目前知识库问答(Knowledge base question answering,KBQA)技术无法有效地处理复杂问题,难以理解其中的复杂语义.将一个复杂问题先分解再整合,是解析复杂语义的有效方法.但是,在问题分解的过程中往往会出现实体判断错误或主题实体缺失的情况,导致分解得到的子问题与原始复杂问题并不匹配.针对上述问题,提出了一种融合事实文本的问解分解式语义解析方法.对复杂问题的处理分为分解-抽取-解析3个阶段,首先把复杂问题分解成简单子问题,然后抽取问句中的关键信息,最后生成结构化查询语句.同时,本文又构造了事实文本库,将三元组转化成用自然语言描述的句子,采用注意力机制获取更丰富的知识.在ComplexWebQuestions数据集上的实验表明,本文提出的模型在性能上优于其他基线模型.
基金the National Natural Science Foundation of China(No.61303094)the Program of Science and Technology Commission of Shanghai Municipality(Nos.16511102400 and 16111107801)the Innovation Program of Shanghai Municipal Education Commission(No.14YZ024)
文摘Question answering systems offer a friendly interface for human beings to interact with massive online information. It is time consuming for users to retrieve useful medical information with search engines among massive online websites. An effort is made to build a Chinese Question Answering System in Medical Domain(CQASMD) to provide useful medical information for users. A large medical knowledge base with more than 300 thousand medical terms and their descriptions is firstly constructed to store the structured medical knowledge data, and classified with the FastText model. Furthermore, a Word2Vec model is adopted to capture the semantic meanings of words, and the questions and answers are processed with sentence embedding to capture semantic context information. Users' questions are firstly classified and processed into a sentence vector and a matching algorithm is adopted to match the most similar question. After querying the constructed medical knowledge base, the corresponding answers to previous questions are responded to users. The architecture and flowchart of CQASMD is proposed, which will play an important role in self disease diagnosis and treatment.