机器学习分类算法在社区问答系统中的应用

Application of Machine Learning Classification Algorithm in Community Question Answering System

下载PDF

导出

摘要机器学习被广泛应用到自然语言处理中,社区问答提供了新的有趣的研究方向。在传统问答领域,通过分类算法研究用户交互行为并分析其交互方式,能够促进用户交互与相关岗位结构的开发。在此背景下,针对SemEval语义测评大赛提供的语料库进行了研究,基于KNN算法、随机森林等分类方法对问题的答案进行分类,并对分类结果进行分析和研究。实验结果表明,GBRT和随机森林这两种算法的分类效果最好。 Machine learning is widely used in natural language processing,and community question answering provides a new and interesting research direction.In the field of traditional Question Answering(QA),it can promote the development of user interaction and related post structure by studying user interaction behavior and analyzing its interaction mode through classification algorithm.In this context,this paper studies the corpus provided by SemEval semantic evaluation contest,classifies the answers based on KNN algorithm,random forest and other classification methods,and analyzes and studies the classification results.Experimental results show that GBRT and random forest algorithm are the best.

作者孙熙然 SUN Xi-ran(China Electronic Technology Corporation,Chengdu 610030,China)

机构地区中国电子科技集团第十研究所

出处《电脑知识与技术》 2021年第12期195-197,共3页 Computer Knowledge and Technology

关键词答案分类自然语言处理机器学习随机森林最邻近节点算法 answer classification natural language processing machine learning nearest neighbor node algorithm

分类号 TP39 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献2

1方匡南,吴见彬,朱建平,谢邦昌.随机森林方法研究综述[J].统计与信息论坛,2011,26(3):32-38. 被引量：688
2唐洵,汤娟,周安民.基于特征选择与随机森林混合模型的社区恶意评论检测研究[J].现代计算机,2020,26(19):22-26. 被引量：2

二级参考文献42

1刘微,罗林开,王华珍.基于随机森林的基金重仓股预测[J].福州大学学报（自然科学版）,2008,36(S1):134-139. 被引量：8
2林成德,彭国兰.随机森林在企业信用评估指标体系确定中的应用[J].厦门大学学报（自然科学版）,2007,46(2):199-203. 被引量：37
3Breiman L. Bagging Preditors [J].Machine Learning, 1996,24(2).
4Dietterich T. An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting and Randomization [J].Machine Learning, 2000,40(2).
5Ho T K. The Random Subspace Method for Constructing Decision Forests [J].Trans. on Pattern Analysis and Machine Intelligence, 1998,20 (8).
6Amit Y, Gernan D. Shape Quantization and Recognition with Randomized Trees[J]. Neural Computation, 1997,9(7). Breiman L Random Forest[J]. Machine Learning, 2001,45(1).
7Breiman L. Random Forests[J]. Machine Learning, 2001,45(1).
8Tibshirani tL Bias, Variance, and Prediction Error for Classification Rules[C]. Technical Report, Statistics Department, University of Toronto, 1996.
9Wolpert D H, Macready W G. An Efficient Method to Estimate Bagging's Generalization Error[J]. Machine Learning, 1999,35(1).
10Breiman L. Out-of-bag Estimation[EB/OL]. [2010- 06- 30]. http//stat, berkeley, edu/ pub/ users/ breiman / OOB estimation, ps.

共引文献688

1郑伟,戴伊宁,孙楠楠,尹乔乔,吴青青,惠田辰,吴文昊,黄海军,童永喜,黄益澄,汪明珊,陈美娟,张家杰,严蓉,高海女,潘红英.应用随机森林模型和Logistic回归模型分析COVID-19的影响因素[J].预防医学,2021,33(7):722-725. 被引量：1
2袁鸷慧,聂胜,张合兵,王成,王宏涛,习晓环.GEDI地面高程和森林冠层高度的精度评价与影响分析[J].遥感技术与应用,2022,37(5):1056-1070. 被引量：2
3谢春,许伟.基于随机森林回归算法的锅炉沾污因数预测方法[J].上海电气技术,2022,15(1):29-32. 被引量：2
4王仁超,朱品光.基于随机森林回归方法的爆破块度预测模型研究[J].水力发电学报,2020,39(1):89-101. 被引量：24
5杨龙,王闻娟,覃哲,古悦璇.中国大学生气候认知与低碳行为及其影响因素研究——基于随机森林模型分析[J].文化与传播,2022,11(2):6-15. 被引量：1
6饶贵川,王雅楠,华伟平,林维晟,潘俊忠,廖佩莹.环境因子对人工森林蓄积量影响的机器学习分析[J].林业科技通讯,2023(12):58-63.
7王治忠,闫文明,王松伟.基于鸽子视顶盖神经元响应对不同颜色背景字符图像的重建研究[J].计算机应用研究,2020,37(1):308-312.
8宋华丽,陈欣影,王鹏,初军玲,丛源.基于随机森林的江淮各省会城市夏季降水量预报对比分析[J].湖北农业科学,2019,58(S02):190-197.
9李一民,谭振宇,杨辰,何峰,孟迪,罗菊花,段洪涛.基于多源卫星的滇池藻华提取机器学习算法研究[J].地球科学进展,2022,37(11):1141-1156. 被引量：6
10劳钰钞,刘秀峰,杨锦礼,蒋志.基于随机森林构建集装箱堆存时间预测分类器的港口翻箱研究[J].装备制造技术,2022(2):209-212.

1廖开际,黄琼影,席运江.在线医疗社区问答文本的知识图谱构建研究[J].情报科学,2021,39(3):51-59. 被引量：17
2牛悦.优化公立医院岗位设置管理的策略探析关键研究[J].财讯,2020(31):74-74. 被引量：1
3张冉.战略视域下的人力资源成本管理[J].商业文化,2021(6):50-51. 被引量：1
4朱蕾.新建本科学校的岗位设置和岗位聘用的思考[J].科教导刊（电子版）,2021(8):123-124.

电脑知识与技术

2021年第12期

浏览历史

内容加载中请稍等...

机器学习分类算法在社区问答系统中的应用

参考文献2

二级参考文献42

共引文献688

相关作者

相关机构

相关主题

浏览历史