基于特征扰动的半监督专家发现方法

Semi-supervised Expert Discovery Method Based on Feature Perturbation

下载PDF

导出

摘要专家可为社区问答提供权威的答复,高效精准的专家发现有助于提升问答社区的服务质量.现有社区用户数据中存在噪声标签数据,且由于专家数量较少造成分类数据不平衡,从而降低了监督学习模型的专家发现精度.针对上述问题,本文提出一种基于特征扰动的半监督专家发现方法.该方法构建了一种无标签数据特征扰动策略,利用Sharpening算法实现无标签数据的伪标签化;基于ADASYN算法,通过构建专家用户邻近样本的方式扩充专家样本数据量,缓解分类数据的不平衡;构建联合损失函数,利用有标签和伪标签数据共同训练分类器,增强模型的泛化性能.实验结果表明,该方法在多个评价指标上优于已有模型和方法. Experts can provide authoritative answers for community question answer(CQA).Efficient and accurate expert discovery can help improve the service quality of CQA.The expert discovery accuracy of the supervised learning model is reduced by the noise label data existing in the community user data as well as the unbalanced classification data due to the small number of experts.A semi-supervised expert discovery method based on feature perturbation is proposed to solve the mentioned problems.In this method,a feature perturbation strategy for unlabeled data is constructed,using the Sharpening algorithm to label the pseudo-label of unlabeled data.Based on the ADASYN algorithm,expert sample data is expanded by constructing neighbor samples of expert users to alleviate the imbalance of classification data.A joint loss function is constructed,which trains the classifier by both the labeled and pseudo-labeled data to enhance the generalization performance of the method.The experimental results show that this method is superior to the existing models and methods in several evaluation indexes.

作者陈卓张樊星杜军威袁玺明 CHEN Zhuo;ZHANG Fanxing;DU Junwei;YUAN Ximing(School of Information Science and Technology,Qingdao University of Science and Technology,Qingdao 266061,China)

机构地区青岛科技大学信息科学技术学院

出处《湖南大学学报（自然科学版）》 EI CAS CSCD 北大核心 2022年第10期85-91,共7页 Journal of Hunan University:Natural Sciences

基金国家自然科学基金资助项目(F030810,6217072142) 山东省重点研发计划资助项目(2018GGX101052) 山东省自然科学基金资助项目(ZR2021MF092)。

关键词专家发现社区问答半监督学习特征扰动 expert discovery community question answer semi-supervised learning feature perturbation

分类号 TP18 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

参考文献4

1张高明,张善从.一种改进的组合策略评审专家推荐算法[J].科技管理研究,2021,41(1):175-180. 被引量：1
2史玉珍,彭智勇.基于修正h指数的学科领域专家发现的研究[J].计算机工程与应用,2011,47(29):1-3. 被引量：4
3龚凯乐,成颖.基于“问题-用户”的网络问答社区专家发现方法研究[J].图书情报工作,2016,60(24):115-121. 被引量：11
4毛铭泽,曹芮浩,闫春钢.基于权值多样性的半监督分类算法[J].计算机应用,2021,41(9):2473-2480. 被引量：2

二级参考文献28

1孟海涛,陈笑蓉.基于模糊相似度的科技文献软聚类算法[J].贵州大学学报（自然科学版）,2007,24(2):175-178. 被引量：9
2Li Y J, Chung S M,Holt J D.Text document clustering based on frequent word meaning sequences[J].Data & Knowledge Engineering, 2008,64: 381-404.
3Maqbool O, Babri H.Hierarchical clustering for sottware architecture recovery[J].Software Engineering, 2007,33 : 759-780.
4Cao Yukun, Li Yunfeng.An intelligent fuzzy-based recommendation system for consumer electronic products[J].Expert Systems with Applications, 2007,33 : 230-240.
5李艳梅.基于文本相似度的中文文本聚类的研究[D].北京:华北电力大学,2008.
6Hirsch J E.An index to quantify an individuals' scientific research output[C]//Proceedings of the National Academy of Sciences of the United States of America,2005,102(46) : 16569-16572.
7Prathap G.A thermodynamic explanation for the Glanzel-Schubert model for the h-index[J].Joumal of the American Society for Information Science and Technology, 2011,62: 992-994.
8聂超,袁浩川.基于扩展h指数的科研评价初探[J].情报理论与实践,2009,32(12):68-70. 被引量：7
9张晓阳,李晓亮.科学家博客h指数评价及其相关性分析[J].图书情报工作,2010,54(2):66-69. 被引量：14
10万雪飞,陈端兵,傅彦.一种重叠社区发现的启发式算法[J].计算机工程与应用,2010,46(3):36-38. 被引量：9

共引文献14

1阳广元,曹霞,甯佐斌,潘煦.国内社区发现研究进展[J].情报资料工作,2014,35(2):29-33. 被引量：3
2杨艳妮,沈阳.基于活跃度和影响力的国家优势学科领域发现——以传播学为例[J].情报杂志,2015,34(8):54-59. 被引量：7
3李纲,张岩,叶光辉.不同语义环境下“小众专家”群体“稳定-变化”特征分析——基于MetaFilter的实证分析[J].图书情报工作,2017,61(2):99-106. 被引量：1
4张发明,方旭鹏.用户角色关系:SQA平台内容质量提升新路径的经验研究[J].图书情报知识,2018,35(1):78-86. 被引量：4
5蒲姗姗.基于知识互补的科研合作专家推荐模型研究[J].情报理论与实践,2018,41(8):96-101. 被引量：19
6刘电霆,吴丽娜.社会网络中基于信任的LDA主题模型领域专家推荐[J].广西师范大学学报（自然科学版）,2018,36(4):51-58. 被引量：1
7石静,厉臣璐,钱宇星,周利琴,张斌.国内外健康问答社区用户信息需求对比研究——基于主题和时间视角的实证分析[J].数据分析与知识发现,2019,3(5):1-10. 被引量：16
8张薇薇,朱杰.专业虚拟社区用户角色研究综述[J].现代情报,2020,40(7):167-177. 被引量：2
9潘梦雅,沈旺,代旺,刘嘉宇.社会化问答社区答题者发现研究[J].图书情报工作,2020,64(18):76-88. 被引量：6
10刘紫琪,白旭,程子佳,兰亚璇,陈翀.面向科技人才信息问答的查询需求及答案组织研究[J].文献与数据学报,2021,3(1):30-44. 被引量：1

1吕晓琦,纪科,陈贞翔,孙润元,马坤,邬俊,李浥东.结合注意力与循环神经网络的专家推荐算法[J].计算机科学与探索,2022,16(9):2068-2077. 被引量：4
2芦楠楠,刘一雄,邱铭恺.基于随机传播图卷积模型的零样本图像分类[J].图学学报,2022,43(4):624-632. 被引量：1
3沙宗轩,霍如,孙闯,汪硕,黄韬.基于深度强化学习的转发效能感知流量调度算法[J].通信学报,2022,43(8):30-40. 被引量：6
4李思成,魏云冰,邱永露.自主多决策粒子群的无线传感器网络覆盖优化[J].仪表技术与传感器,2022(9):26-35. 被引量：6
5孙晓东,王丕坤,杨东强.基于反向翻译的英语语法纠错应用研究[J].计算机技术与发展,2022,32(10):143-150. 被引量：2
6Hai-hua Chen,Xian-feng Zhang,Lan-hong Dai,Chuang Liu,Wei Xiong,Meng-ting Tan.Experimental study on WFeNiMo high-entropy alloy projectile penetrating semi-infinite steel target[J].Defence Technology（防务技术）,2022,18(8):1470-1482. 被引量：3
7刘敏,易鸣,武明虎,王娟,何宇.Breast Pathological Image Classification Based on VGG16 Feature Concatenation[J].Journal of Shanghai Jiaotong university(Science),2022,27(4):473-484.
8Qibing Li.An Improved Gas-Kinetic Scheme for Multimaterial Flows[J].Communications in Computational Physics,2020,27(1):145-166. 被引量：1
9Shami Ulla.Theory-laden model of ethical applications and ethics of euthanasia[J].History & Philosophy of Medicine,2022,4(4):26-30.
10Chunmei Li,Jinyong Wang,Dongyang Li,Nasir Ilyas,Zhiqiang Yang,Kexin Chen,Peng Gu,Xiangdong Jiang,Deen Gu,Fucai Liu,Yadong Jiang,Wei Li.An oxide-based heterojunction optoelectronic synaptic device with wideband and rapid response performance[J].Journal of Materials Science & Technology,2022(28):159-167.

湖南大学学报（自然科学版）

2022年第10期

浏览历史

内容加载中请稍等...

基于特征扰动的半监督专家发现方法

参考文献4

二级参考文献28

共引文献14

相关作者

相关机构

相关主题

浏览历史