Discovering High-Quality Threaded Discussions in Online Forums

Discovering High-Quality Threaded Discussions in Online Forums

导出

摘要 Archives of threaded discussions generated by users in online forums and discussion boards contain valuable knowledge on various topics. However, not all threads are useful because of deliberate abuses, such as trolling and flaming, that are commonly observed in online conversations. The existence of various users with different levels of expertise also makes it difficult to assume that every discussion thread stored online contains high-quality contents. Although finding high-quality threads automatically can help both users and search engines sift through a huge amount of thread archives and make use of these potentially useful resources effectively, no previous work to our knowledge has performed a study on such task. In this paper, we propose an automatic method for distinguishing high-quality threads from low-quality ones in online discussion sites. We first suggest four different artificial measures for inducing overall quality of a thread based on ratings of its posts. We then propose two tasks involving prediction of thread quality without using post rating information. We adopt a popular machine learning framework to solve the two prediction tasks. Experimental results on a real world forum archive demonstrate that our method can significantly improve the prediction performance across all four measures of thread quality on both tasks. We also compare how different types of features derived from various aspects of threads contribute to the overall performance and investigate key features that play a crucial role in discovering high-quality threads in online discussion sites. Archives of threaded discussions generated by users in online forums and discussion boards contain valuable knowledge on various topics. However, not all threads are useful because of deliberate abuses, such as trolling and flaming, that are commonly observed in online conversations. The existence of various users with different levels of expertise also makes it difficult to assume that every discussion thread stored online contains high-quality contents. Although finding high-quality threads automatically can help both users and search engines sift through a huge amount of thread archives and make use of these potentially useful resources effectively, no previous work to our knowledge has performed a study on such task. In this paper, we propose an automatic method for distinguishing high-quality threads from low-quality ones in online discussion sites. We first suggest four different artificial measures for inducing overall quality of a thread based on ratings of its posts. We then propose two tasks involving prediction of thread quality without using post rating information. We adopt a popular machine learning framework to solve the two prediction tasks. Experimental results on a real world forum archive demonstrate that our method can significantly improve the prediction performance across all four measures of thread quality on both tasks. We also compare how different types of features derived from various aspects of threads contribute to the overall performance and investigate key features that play a crucial role in discovering high-quality threads in online discussion sites.

作者 Jung-Tae Lee Min-Chul Yang Hae-Chang Rim

机构地区 Department of Computer and Radio Communications Engineering

出处《Journal of Computer Science & Technology》 SCIE EI CSCD 2014年第3期519-531,共13页 计算机科学技术学报（英文版）

基金 supported by the Ministry of Knowledge Economy(MKE),Korea Microsoft Research through the IT/SW Creative Research Program supervised by the National IT Industry Promotion Agency(NIPA)of Korea under Grant No.NIPA2012-H0503-12-1012 the Next-Generation Information Computing Development Program through the National Research Foundation of Korea(NRF)funded by the Ministry of Science,ICT&Future Planning of Korea under Grant No.NRF-2012M3C4A7033344

关键词 online forum discussion board thread quality online forum, discussion board, thread quality

分类号 TP393.09 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献25

1Bhatia S, Mitra P. Adopting inference networks tbr online thread retrieval. In Proc. the 2th AAAI, July 2010, pp.1300- 1305.
2Elsas J L, Carbonell J (3. It pays to be picky: An evaluation of thread retrieval in online forums. In Proc. the 32nd SIGIR, July 2009, pp.714-715.
3Seo J, Croft W B, Smith D A. Online community search us- ing thread structure. In Proc. the 18th CIKM, Nov. 2009, pp.1907-1910.
4Joachims T. Optimizing search engines using clickthrough data. In Proc. the 8th ACM KDD, July 2002, pp.133-142.
5Agichtein E, Castillo C, Donato D, Gionis A, Mishne (3. Find- ing high-quality content in social media. In Proc. WSDM, Feb. 2008, pp.183-194.
6Jeon J, Croft W B, Lee d H, Park S. A framework to predict the quality of answers with non-textuM features. In Proc. the 29th SIGIR, Aug. 2006, pp.228-235.
7Gomez V, Kaltenbrunner A, Ldpez V. Statistical analysis of the social network and discussion threads in slashdot. In Proc. the 17th WWW, April 2008, pp.645-654.
8Joachims T. Making large-scale SVM learning practical. In Advances in Kernel Methods: Support Vector Leaoting, Sch61kopf B, Burges C J C, Smola A J (eds.), The MIT Press, 1999, pp.169-184.
9Jorvelin K, Kekgl/iinen J. Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems, 2002, 20(4): 422-446.
10Liu J, Cao Y, Lin C Y, Huang Y, Zhou M. Low-quaSty product review detection in opinion summarization. In Proc. EMNLP-CoNLL, June 2007, pp.334-342.

1纪竹亮,戴连奎.一种改进的自适应路由算法[J].计算机工程,2004,30(9):150-151. 被引量：2
2刘少东.感受虚拟之美[J].信息方略,2011(15):50-52.
3LI Gang, TONG Fu School of Computer Engineering and Science, Shanghai University, Shanghai 200072, China.Discovering Patterns in Symbolic Streams[J].Advances in Manufacturing,2000(S1):83-85.
4卞良.数字化教学中家长和学校互动平台的设计与实现[J].现代计算机,2008,14(11):126-128.
5李莲华,杨淑娟.网上讨论在课程教学中应用初探[J].网友世界,2013(2):13-14. 被引量：1
6Should Online Forums Be Monetized?[J].Beijing Review,2016,59(7):46-47.
7陆群.人生始自网络——关于网络与人生的对话[J].微电脑世界,1998(15):56-57.
8编读往来[J].乳品与人类,2005(4):8-8.
9杨佳.从细分市场入手脚踏实地推动车联网发展[J].A&S（安全&自动化）,2013(11):84-87.
10新水晶神话未解之谜[J].游戏机实用技术,2010(2):21-21.

Journal of Computer Science & Technology

2014年第3期

浏览历史

内容加载中请稍等...

Discovering High-Quality Threaded Discussions in Online Forums

参考文献25

相关作者

相关机构

相关主题

浏览历史