摘要
专家可为社区问答提供权威的答复,高效精准的专家发现有助于提升问答社区的服务质量.现有社区用户数据中存在噪声标签数据,且由于专家数量较少造成分类数据不平衡,从而降低了监督学习模型的专家发现精度.针对上述问题,本文提出一种基于特征扰动的半监督专家发现方法.该方法构建了一种无标签数据特征扰动策略,利用Sharpening算法实现无标签数据的伪标签化;基于ADASYN算法,通过构建专家用户邻近样本的方式扩充专家样本数据量,缓解分类数据的不平衡;构建联合损失函数,利用有标签和伪标签数据共同训练分类器,增强模型的泛化性能.实验结果表明,该方法在多个评价指标上优于已有模型和方法.
Experts can provide authoritative answers for community question answer(CQA).Efficient and accurate expert discovery can help improve the service quality of CQA.The expert discovery accuracy of the supervised learning model is reduced by the noise label data existing in the community user data as well as the unbalanced classification data due to the small number of experts.A semi-supervised expert discovery method based on feature perturbation is proposed to solve the mentioned problems.In this method,a feature perturbation strategy for unlabeled data is constructed,using the Sharpening algorithm to label the pseudo-label of unlabeled data.Based on the ADASYN algorithm,expert sample data is expanded by constructing neighbor samples of expert users to alleviate the imbalance of classification data.A joint loss function is constructed,which trains the classifier by both the labeled and pseudo-labeled data to enhance the generalization performance of the method.The experimental results show that this method is superior to the existing models and methods in several evaluation indexes.
作者
陈卓
张樊星
杜军威
袁玺明
CHEN Zhuo;ZHANG Fanxing;DU Junwei;YUAN Ximing(School of Information Science and Technology,Qingdao University of Science and Technology,Qingdao 266061,China)
出处
《湖南大学学报(自然科学版)》
EI
CAS
CSCD
北大核心
2022年第10期85-91,共7页
Journal of Hunan University:Natural Sciences
基金
国家自然科学基金资助项目(F030810,6217072142)
山东省重点研发计划资助项目(2018GGX101052)
山东省自然科学基金资助项目(ZR2021MF092)。
关键词
专家发现
社区问答
半监督学习
特征扰动
expert discovery
community question answer
semi-supervised learning
feature perturbation