期刊文献+

基于特征扰动的半监督专家发现方法

Semi-supervised Expert Discovery Method Based on Feature Perturbation
下载PDF
导出
摘要 专家可为社区问答提供权威的答复,高效精准的专家发现有助于提升问答社区的服务质量.现有社区用户数据中存在噪声标签数据,且由于专家数量较少造成分类数据不平衡,从而降低了监督学习模型的专家发现精度.针对上述问题,本文提出一种基于特征扰动的半监督专家发现方法.该方法构建了一种无标签数据特征扰动策略,利用Sharpening算法实现无标签数据的伪标签化;基于ADASYN算法,通过构建专家用户邻近样本的方式扩充专家样本数据量,缓解分类数据的不平衡;构建联合损失函数,利用有标签和伪标签数据共同训练分类器,增强模型的泛化性能.实验结果表明,该方法在多个评价指标上优于已有模型和方法. Experts can provide authoritative answers for community question answer(CQA).Efficient and accurate expert discovery can help improve the service quality of CQA.The expert discovery accuracy of the supervised learning model is reduced by the noise label data existing in the community user data as well as the unbalanced classification data due to the small number of experts.A semi-supervised expert discovery method based on feature perturbation is proposed to solve the mentioned problems.In this method,a feature perturbation strategy for unlabeled data is constructed,using the Sharpening algorithm to label the pseudo-label of unlabeled data.Based on the ADASYN algorithm,expert sample data is expanded by constructing neighbor samples of expert users to alleviate the imbalance of classification data.A joint loss function is constructed,which trains the classifier by both the labeled and pseudo-labeled data to enhance the generalization performance of the method.The experimental results show that this method is superior to the existing models and methods in several evaluation indexes.
作者 陈卓 张樊星 杜军威 袁玺明 CHEN Zhuo;ZHANG Fanxing;DU Junwei;YUAN Ximing(School of Information Science and Technology,Qingdao University of Science and Technology,Qingdao 266061,China)
出处 《湖南大学学报(自然科学版)》 EI CAS CSCD 北大核心 2022年第10期85-91,共7页 Journal of Hunan University:Natural Sciences
基金 国家自然科学基金资助项目(F030810,6217072142) 山东省重点研发计划资助项目(2018GGX101052) 山东省自然科学基金资助项目(ZR2021MF092)。
关键词 专家发现 社区问答 半监督学习 特征扰动 expert discovery community question answer semi-supervised learning feature perturbation
  • 相关文献

参考文献4

二级参考文献28

  • 1孟海涛,陈笑蓉.基于模糊相似度的科技文献软聚类算法[J].贵州大学学报(自然科学版),2007,24(2):175-178. 被引量:9
  • 2Li Y J, Chung S M,Holt J D.Text document clustering based on frequent word meaning sequences[J].Data & Knowledge Engineering, 2008,64: 381-404.
  • 3Maqbool O, Babri H.Hierarchical clustering for sottware architecture recovery[J].Software Engineering, 2007,33 : 759-780.
  • 4Cao Yukun, Li Yunfeng.An intelligent fuzzy-based recommendation system for consumer electronic products[J].Expert Systems with Applications, 2007,33 : 230-240.
  • 5李艳梅.基于文本相似度的中文文本聚类的研究[D].北京:华北电力大学,2008.
  • 6Hirsch J E.An index to quantify an individuals' scientific research output[C]//Proceedings of the National Academy of Sciences of the United States of America,2005,102(46) : 16569-16572.
  • 7Prathap G.A thermodynamic explanation for the Glanzel-Schubert model for the h-index[J].Joumal of the American Society for Information Science and Technology, 2011,62: 992-994.
  • 8聂超,袁浩川.基于扩展h指数的科研评价初探[J].情报理论与实践,2009,32(12):68-70. 被引量:7
  • 9张晓阳,李晓亮.科学家博客h指数评价及其相关性分析[J].图书情报工作,2010,54(2):66-69. 被引量:14
  • 10万雪飞,陈端兵,傅彦.一种重叠社区发现的启发式算法[J].计算机工程与应用,2010,46(3):36-38. 被引量:9

共引文献14

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部