Over the last couple of decades,community question-answering sites(CQAs)have been a topic of much academic interest.Scholars have often leveraged traditional machine learning(ML)and deep learning(DL)to explore the eve...Over the last couple of decades,community question-answering sites(CQAs)have been a topic of much academic interest.Scholars have often leveraged traditional machine learning(ML)and deep learning(DL)to explore the ever-growing volume of content that CQAs engender.To clarify the current state of the CQA literature that has used ML and DL,this paper reports a systematic literature review.The goal is to summarise and synthesise the major themes of CQA research related to(i)questions,(ii)answers and(iii)users.The final review included 133 articles.Dominant research themes include question quality,answer quality,and expert identification.In terms of dataset,some of the most widely studied platforms include Yahoo!Answers,Stack Exchange and Stack Overflow.The scope of most articles was confined to just one platform with few cross-platform investigations.Articles with ML outnumber those with DL.Nonetheless,the use of DL in CQA research is on an upward trajectory.A number of research directions are proposed.展开更多
ExpertRecommendation(ER)aims to identify domain experts with high expertise and willingness to provide answers to questions in Community Question Answering(CQA)web services.How to model questions and users in the hete...ExpertRecommendation(ER)aims to identify domain experts with high expertise and willingness to provide answers to questions in Community Question Answering(CQA)web services.How to model questions and users in the heterogeneous content network is critical to this task.Most traditional methods focus on modeling questions and users based on the textual content left in the community while ignoring the structural properties of heterogeneous CQA networks and always suffering from textual data sparsity issues.Recent approaches take advantage of structural proximities between nodes and attempt to fuse the textual content of nodes for modeling.However,they often fail to distinguish the nodes’personalized preferences and only consider the textual content of a part of the nodes in network embedding learning,while ignoring the semantic relevance of nodes.In this paper,we propose a novel framework that jointly considers the structural proximity relations and textual semantic relevance to model users and questions more comprehensively.Specifically,we learn topology-based embeddings through a hierarchical attentive network learning strategy,in which the proximity information and the personalized preference of nodes are encoded and preserved.Meanwhile,we utilize the node’s textual content and the text correlation between adjacent nodes to build the content-based embedding through a meta-context-aware skip-gram model.In addition,the user’s relative answer quality is incorporated to promote the ranking performance.Experimental results show that our proposed framework consistently and significantly outperforms the state-of-the-art baselines on three real-world datasets by taking the deep semantic understanding and structural feature learning together.The performance of the proposed work is analyzed in terms of MRR,P@K,and MAP and is proven to be more advanced than the existing methodologies.展开更多
As a new type of knowledge sharing platform,the community question answer website realizes the acquisition and sharing of knowledge,and is loved and sought after by the majority of users.But for multi-answer questions...As a new type of knowledge sharing platform,the community question answer website realizes the acquisition and sharing of knowledge,and is loved and sought after by the majority of users.But for multi-answer questions,answer quality assessment becomes a challenge.The answer selection in CQA(Community Question Answer)was proposed as a challenge task in the SemEval competition,which gave a data set and proposed two subtasks.Task-A is to give a question(including short title and extended description)and its answers,and divide each answer into absolutely relevant(good),potentially relevant(potential)and bad or irrelevant(bad,dialog,non-English,other).Task-B is to give a YES/NO type question(including short title and extended description)and some answers.Based on the answer of the absolute correlation type(good),judge whether the answer to the whole question should be yes,no or uncertain.This paper first preprocesses this data set,and then uses natural language processing technology to perform word segmentation,part-of-speech tagging and named entity recognition on the data set,and then perform feature extraction on the preprocessed data set.Finally,SVM and random forest are used to classify on the basis of feature extraction,and the classification results are analyzed and compared.The experiments in this paper show that SVM and random forest methods have good results on the data set,and exceed the multi-classifier ensemble learning method and hierarchical classification method proposed by the predecessors.展开更多
Given the limitations of the community question answering(CQA)answer quality prediction method in measuring the semantic information of the answer text,this paper proposes an answer quality prediction model based on t...Given the limitations of the community question answering(CQA)answer quality prediction method in measuring the semantic information of the answer text,this paper proposes an answer quality prediction model based on the question-answer joint learning(ACLSTM).The attention mechanism is used to obtain the dependency relationship between the Question-and-Answer(Q&A)pairs.Convolutional Neural Network(CNN)and Long Short-term Memory Network(LSTM)are used to extract semantic features of Q&A pairs and calculate their matching degree.Besides,answer semantic representation is combined with other effective extended features as the input representation of the fully connected layer.Compared with other quality prediction models,the ACLSTM model can effectively improve the prediction effect of answer quality.In particular,the mediumquality answer prediction,and its prediction effect is improved after adding effective extended features.Experiments prove that after the ACLSTM model learning,the Q&A pairs can better measure the semantic match between each other,fully reflecting the model’s superior performance in the semantic information processing of the answer text.展开更多
This paper compares 12 representative Chinese and English online questionanswering communities(Q&A communities) based on their basic functions, interactive modes, and customized services. An empirical experiment f...This paper compares 12 representative Chinese and English online questionanswering communities(Q&A communities) based on their basic functions, interactive modes, and customized services. An empirical experiment from a comparative perspective was also conducted on them by using 12 questions representing for four types of questions,which are assigned evenly to three different subject fields so as to examine the task performance of these 12 selected online Q&A communities. Our goal was to evaluate those online Q&A communities in terms of their quality and efficiency for answering questions posed to them. It was hoped that our empirical research would yield greater understanding and insights to the working intricacy of these online Q&A communities and hence their possible further improvement.展开更多
Community question answering (CQA) represents the type of Web applications where people can exchange knowledge via asking and answering questions. One significant challenge of most real-world CQA systems is the lack...Community question answering (CQA) represents the type of Web applications where people can exchange knowledge via asking and answering questions. One significant challenge of most real-world CQA systems is the lack of effective matching between questions and the potential good answerers, which adversely affects the efficient knowledge acquisition and circulation. On the one hand, a requester might experience many low-quality answers without receiving a quality response in a brief time; on the other hand, an answerer might face numerous new questions without being able to identify the questions of interest quickly. Under this situation, expert recommendation emerges as a promising technique to address the above issues. Instead of passively waiting for users to browse and find their questions of interest, an expert recommendation method raises the attention of users to the appropriate questions actively and promptly. The past few years have witnessed considerable efforts that address the expert recommendation problem from different perspectives. These methods all have their issues that need to be resolved before the advantages of expert recommendation can be fully embraced. In this survey, we first present an overview of the research efforts and state-of-the-art techniques for the expert recommendation in CQA. We next summarize and compare the existing methods concerning their advantages and shortcomings, followed by discussing the open issues and future research directions.展开更多
Contextual question answering (CQA), in which user information needs are satisfied through an interactive question answering (QA) dialog, has recently attracted more research attention. One challenge is to fuse co...Contextual question answering (CQA), in which user information needs are satisfied through an interactive question answering (QA) dialog, has recently attracted more research attention. One challenge is to fuse contextual information into the understanding process of relevant questions. In this paper, a discourse structure is proposed to maintain semantic information, and approaches for recognition of relevancy type and fusion of contextual information according to relevancy type are proposed. The system is evaluated on real contextual QA data. The results show that better performance is achieved than a baseline system and almost the same performance as when these contextual phenomena are resolved manually. A detailed evaluation analysis is presented.展开更多
Purpose:A social question & answer(SQA) community's long-term sustainability depends on its members' willingness to stay and contribute their knowledge continuously in the community.This research aims to i...Purpose:A social question & answer(SQA) community's long-term sustainability depends on its members' willingness to stay and contribute their knowledge continuously in the community.This research aims to investigate the critical factors which influence users' intention to continue contributing knowledge in the SQA community.Design/methodology/approach:Grounded on information systems(IS) continuance theory,this study put forward a model of the factors that influence SQA community members' intention to continue contributing knowledge.Survey was conducted to gather data from knowledge contributors of four major Chinese SQA communities(Baidu Knows,Sina iAsk,Soso Ask and Yahoo! Knowledge).By using the partial least squares(PLS) technique,research hypotheses derived from the proposed model were empirically validated.Findings:Except enjoyment in helping others and knowledge self-efficacy,all other factors including extrinsic reward,reputation enhancement,realization of self-worth,perceived usefulness,attitude towards knowledge contribution,and satisfaction exert significant impacts on users' continuance intentions in an SQA community.Research limitations:First,important factors such as the ease of use of information systems which may influence users' continuance intentions were not investigated in the study.Second,the study sample needs to be enlarged,and users of smaller SQA communities should also be included,to make the results more representative.Practical implications:This study will help SQA community designers and managers develop or improve incentive mechanisms to attract more people to contribute their knowledge and promote the development of the SQA community.Originality/value:This study improves the previous research models and puts forward a model of user continuance intention to contribute knowledge in an SQA community.It will extend the understanding of SQA community users' intention to continue contributing knowledge by distinguishing these users' different roles and focusing only on knowledge contributors.展开更多
Community Question Answering(CQA) in web forums, as a classic forum for user communication,provides a large number of high-quality useful answers in comparison with traditional question answering.Development of method...Community Question Answering(CQA) in web forums, as a classic forum for user communication,provides a large number of high-quality useful answers in comparison with traditional question answering.Development of methods to get good, honest answers according to user questions is a challenging task in natural language processing. Many answers are not associated with the actual problem or shift the subjects,and this usually occurs in relatively long answers. In this paper, we enhance answer selection in CQA using multidimensional feature combination and similarity order. We make full use of the information in answers to questions to determine the similarity between questions and answers, and use the text-based description of the answer to determine whether it is a reasonable one. Our work includes two subtasks:(a) classifying answers as good, bad, or potentially associated with a question, and(b) answering YES/NO based on a list of all answers to a question. The experimental results show that our approach is significantly more efficient than the baseline model, and its overall ranking is relatively high in comparison with that of other models.展开更多
文摘Over the last couple of decades,community question-answering sites(CQAs)have been a topic of much academic interest.Scholars have often leveraged traditional machine learning(ML)and deep learning(DL)to explore the ever-growing volume of content that CQAs engender.To clarify the current state of the CQA literature that has used ML and DL,this paper reports a systematic literature review.The goal is to summarise and synthesise the major themes of CQA research related to(i)questions,(ii)answers and(iii)users.The final review included 133 articles.Dominant research themes include question quality,answer quality,and expert identification.In terms of dataset,some of the most widely studied platforms include Yahoo!Answers,Stack Exchange and Stack Overflow.The scope of most articles was confined to just one platform with few cross-platform investigations.Articles with ML outnumber those with DL.Nonetheless,the use of DL in CQA research is on an upward trajectory.A number of research directions are proposed.
文摘ExpertRecommendation(ER)aims to identify domain experts with high expertise and willingness to provide answers to questions in Community Question Answering(CQA)web services.How to model questions and users in the heterogeneous content network is critical to this task.Most traditional methods focus on modeling questions and users based on the textual content left in the community while ignoring the structural properties of heterogeneous CQA networks and always suffering from textual data sparsity issues.Recent approaches take advantage of structural proximities between nodes and attempt to fuse the textual content of nodes for modeling.However,they often fail to distinguish the nodes’personalized preferences and only consider the textual content of a part of the nodes in network embedding learning,while ignoring the semantic relevance of nodes.In this paper,we propose a novel framework that jointly considers the structural proximity relations and textual semantic relevance to model users and questions more comprehensively.Specifically,we learn topology-based embeddings through a hierarchical attentive network learning strategy,in which the proximity information and the personalized preference of nodes are encoded and preserved.Meanwhile,we utilize the node’s textual content and the text correlation between adjacent nodes to build the content-based embedding through a meta-context-aware skip-gram model.In addition,the user’s relative answer quality is incorporated to promote the ranking performance.Experimental results show that our proposed framework consistently and significantly outperforms the state-of-the-art baselines on three real-world datasets by taking the deep semantic understanding and structural feature learning together.The performance of the proposed work is analyzed in terms of MRR,P@K,and MAP and is proven to be more advanced than the existing methodologies.
文摘As a new type of knowledge sharing platform,the community question answer website realizes the acquisition and sharing of knowledge,and is loved and sought after by the majority of users.But for multi-answer questions,answer quality assessment becomes a challenge.The answer selection in CQA(Community Question Answer)was proposed as a challenge task in the SemEval competition,which gave a data set and proposed two subtasks.Task-A is to give a question(including short title and extended description)and its answers,and divide each answer into absolutely relevant(good),potentially relevant(potential)and bad or irrelevant(bad,dialog,non-English,other).Task-B is to give a YES/NO type question(including short title and extended description)and some answers.Based on the answer of the absolute correlation type(good),judge whether the answer to the whole question should be yes,no or uncertain.This paper first preprocesses this data set,and then uses natural language processing technology to perform word segmentation,part-of-speech tagging and named entity recognition on the data set,and then perform feature extraction on the preprocessed data set.Finally,SVM and random forest are used to classify on the basis of feature extraction,and the classification results are analyzed and compared.The experiments in this paper show that SVM and random forest methods have good results on the data set,and exceed the multi-classifier ensemble learning method and hierarchical classification method proposed by the predecessors.
基金the Zhejiang Provincial Natural Science Foundation of China under Grant No.LGF18F020011.
文摘Given the limitations of the community question answering(CQA)answer quality prediction method in measuring the semantic information of the answer text,this paper proposes an answer quality prediction model based on the question-answer joint learning(ACLSTM).The attention mechanism is used to obtain the dependency relationship between the Question-and-Answer(Q&A)pairs.Convolutional Neural Network(CNN)and Long Short-term Memory Network(LSTM)are used to extract semantic features of Q&A pairs and calculate their matching degree.Besides,answer semantic representation is combined with other effective extended features as the input representation of the fully connected layer.Compared with other quality prediction models,the ACLSTM model can effectively improve the prediction effect of answer quality.In particular,the mediumquality answer prediction,and its prediction effect is improved after adding effective extended features.Experiments prove that after the ACLSTM model learning,the Q&A pairs can better measure the semantic match between each other,fully reflecting the model’s superior performance in the semantic information processing of the answer text.
基金jointly supported by Wuhan International Science and Technology Cooperation Fund(Grant No.201070934337)the 3rd Special Award of China Postdoctoral Science Foundation(Grant No.201003497)National Science Foundation of USA(Grant No.NSF/IIS-1052773)
文摘This paper compares 12 representative Chinese and English online questionanswering communities(Q&A communities) based on their basic functions, interactive modes, and customized services. An empirical experiment from a comparative perspective was also conducted on them by using 12 questions representing for four types of questions,which are assigned evenly to three different subject fields so as to examine the task performance of these 12 selected online Q&A communities. Our goal was to evaluate those online Q&A communities in terms of their quality and efficiency for answering questions posed to them. It was hoped that our empirical research would yield greater understanding and insights to the working intricacy of these online Q&A communities and hence their possible further improvement.
文摘Community question answering (CQA) represents the type of Web applications where people can exchange knowledge via asking and answering questions. One significant challenge of most real-world CQA systems is the lack of effective matching between questions and the potential good answerers, which adversely affects the efficient knowledge acquisition and circulation. On the one hand, a requester might experience many low-quality answers without receiving a quality response in a brief time; on the other hand, an answerer might face numerous new questions without being able to identify the questions of interest quickly. Under this situation, expert recommendation emerges as a promising technique to address the above issues. Instead of passively waiting for users to browse and find their questions of interest, an expert recommendation method raises the attention of users to the appropriate questions actively and promptly. The past few years have witnessed considerable efforts that address the expert recommendation problem from different perspectives. These methods all have their issues that need to be resolved before the advantages of expert recommendation can be fully embraced. In this survey, we first present an overview of the research efforts and state-of-the-art techniques for the expert recommendation in CQA. We next summarize and compare the existing methods concerning their advantages and shortcomings, followed by discussing the open issues and future research directions.
文摘判断问题相似是社区问答(community question answer,CQA)中很重要的一个研究方向.社区问答中的问题通常由主题和描述构成.由于社区问答的开放性,用户的提问长短不一,而问题中会包含大量干扰模型判断问题是否相似的背景信息.为了减少上述问题对计算问题相似度的影响,模型将关键词及问题主题视为问题的关键信息,并使用这些信息计算问题相似度.首先,在基于文本间相似及相异信息的CNN模型的基础上引入了关键词抽取技术.同时,为了更好地利用问题主题的信息,模型融合了问题主题相似度的特征.模型在SemEval2017评测的问题相似任务中进行了实验,其平均精度均值(mean average precision,MAP)达到了49.65%,超过了评测中的最佳结果.
文摘Contextual question answering (CQA), in which user information needs are satisfied through an interactive question answering (QA) dialog, has recently attracted more research attention. One challenge is to fuse contextual information into the understanding process of relevant questions. In this paper, a discourse structure is proposed to maintain semantic information, and approaches for recognition of relevancy type and fusion of contextual information according to relevancy type are proposed. The system is evaluated on real contextual QA data. The results show that better performance is achieved than a baseline system and almost the same performance as when these contextual phenomena are resolved manually. A detailed evaluation analysis is presented.
基金supported by Wuhan University Development Program for Researchers Born after the 1970s
文摘Purpose:A social question & answer(SQA) community's long-term sustainability depends on its members' willingness to stay and contribute their knowledge continuously in the community.This research aims to investigate the critical factors which influence users' intention to continue contributing knowledge in the SQA community.Design/methodology/approach:Grounded on information systems(IS) continuance theory,this study put forward a model of the factors that influence SQA community members' intention to continue contributing knowledge.Survey was conducted to gather data from knowledge contributors of four major Chinese SQA communities(Baidu Knows,Sina iAsk,Soso Ask and Yahoo! Knowledge).By using the partial least squares(PLS) technique,research hypotheses derived from the proposed model were empirically validated.Findings:Except enjoyment in helping others and knowledge self-efficacy,all other factors including extrinsic reward,reputation enhancement,realization of self-worth,perceived usefulness,attitude towards knowledge contribution,and satisfaction exert significant impacts on users' continuance intentions in an SQA community.Research limitations:First,important factors such as the ease of use of information systems which may influence users' continuance intentions were not investigated in the study.Second,the study sample needs to be enlarged,and users of smaller SQA communities should also be included,to make the results more representative.Practical implications:This study will help SQA community designers and managers develop or improve incentive mechanisms to attract more people to contribute their knowledge and promote the development of the SQA community.Originality/value:This study improves the previous research models and puts forward a model of user continuance intention to contribute knowledge in an SQA community.It will extend the understanding of SQA community users' intention to continue contributing knowledge by distinguishing these users' different roles and focusing only on knowledge contributors.
基金developed by the NLP601 group at School of Electronics Engineering and Computer Science, Peking University, within the National Natural Science Foundation of China (No. 61672046)
文摘Community Question Answering(CQA) in web forums, as a classic forum for user communication,provides a large number of high-quality useful answers in comparison with traditional question answering.Development of methods to get good, honest answers according to user questions is a challenging task in natural language processing. Many answers are not associated with the actual problem or shift the subjects,and this usually occurs in relatively long answers. In this paper, we enhance answer selection in CQA using multidimensional feature combination and similarity order. We make full use of the information in answers to questions to determine the similarity between questions and answers, and use the text-based description of the answer to determine whether it is a reasonable one. Our work includes two subtasks:(a) classifying answers as good, bad, or potentially associated with a question, and(b) answering YES/NO based on a list of all answers to a question. The experimental results show that our approach is significantly more efficient than the baseline model, and its overall ranking is relatively high in comparison with that of other models.