The public content increasingly available on the Internet, especially in online forums, enables researchers to study society in new ways. However, qualitative analysis of online forums is very time consuming and most ...The public content increasingly available on the Internet, especially in online forums, enables researchers to study society in new ways. However, qualitative analysis of online forums is very time consuming and most content is not related to researchers’ interest. Consequently, analysts face the following problem: how to efficiently explore and select the content to be analyzed? This article introduces a new process to support analysts in solving this problem. This process is based on unsupervised machine learning techniques like hierarchical clustering and term co-occurrence network. A tool that helps to apply the proposed process was created to provide consolidated and structured results. This includes measurements and a content exploration interface.展开更多
Archives of threaded discussions generated by users in online forums and discussion boards contain valuable knowledge on various topics. However, not all threads are useful because of deliberate abuses, such as trolli...Archives of threaded discussions generated by users in online forums and discussion boards contain valuable knowledge on various topics. However, not all threads are useful because of deliberate abuses, such as trolling and flaming, that are commonly observed in online conversations. The existence of various users with different levels of expertise also makes it difficult to assume that every discussion thread stored online contains high-quality contents. Although finding high-quality threads automatically can help both users and search engines sift through a huge amount of thread archives and make use of these potentially useful resources effectively, no previous work to our knowledge has performed a study on such task. In this paper, we propose an automatic method for distinguishing high-quality threads from low-quality ones in online discussion sites. We first suggest four different artificial measures for inducing overall quality of a thread based on ratings of its posts. We then propose two tasks involving prediction of thread quality without using post rating information. We adopt a popular machine learning framework to solve the two prediction tasks. Experimental results on a real world forum archive demonstrate that our method can significantly improve the prediction performance across all four measures of thread quality on both tasks. We also compare how different types of features derived from various aspects of threads contribute to the overall performance and investigate key features that play a crucial role in discovering high-quality threads in online discussion sites.展开更多
There is a major defect when using the traditional topic-opinion model for post opinion classifications in an online forum discussion.The accuracy of the classification based on the topic-opinion model highly depends ...There is a major defect when using the traditional topic-opinion model for post opinion classifications in an online forum discussion.The accuracy of the classification based on the topic-opinion model highly depends on the observable topic-opinion features aiming at the subject,while a large number of posts do not have such features in a forum.Therefore,for the most part,the accuracy is less than 78%.To solve this problem,we propose a new method to identify post opinions based on the Tree Conditional Random Fields(T-CRFs)model.First,we select the topic-opinion features of the posts and associated opinion features between posts to construct the T-CRFs model,and then we use the T-CRFs model to label the opinions of the tree-structured posts under the same topic iteratively to reach a maximum joint probability.To reduce the training cost,we design a simplified tree diagram module and some feature templates.Experimental results suggest the proposed method costs less training time and improves the accuracy by 11%.展开更多
In this study,we use initial public offerings(IPOs)in China to investigate how online stock forums influence information asymmetry and IPO valuation.The empirical analysis isolates the underpricing and overvaluation c...In this study,we use initial public offerings(IPOs)in China to investigate how online stock forums influence information asymmetry and IPO valuation.The empirical analysis isolates the underpricing and overvaluation components of initial returns.The number of forum comments,postings,and readings are positively associated with initial returns and the degree of underpricing,implying that forums create noise that exacerbates information asymmetry during IPOs.This effect is amplified by the quiet period regulation,which drives investors to rely on online discussion forums to obtain information.Through sentiment analyses of forum posts and media coverage,we find that the negative effect of online forums is more prominent when bad news prevails.We clarify the role of online stock forums in IPO pricing and information asymmetry by separating underpricing from overvaluation in initial returns.展开更多
基金sponsored by CNPq(Brazilian Council for Research and Development),process 142620/2009-2FAPESP(State of Sao Paulo Research Foundation),process 2010/20564-8 and 2011/19850-9.
文摘The public content increasingly available on the Internet, especially in online forums, enables researchers to study society in new ways. However, qualitative analysis of online forums is very time consuming and most content is not related to researchers’ interest. Consequently, analysts face the following problem: how to efficiently explore and select the content to be analyzed? This article introduces a new process to support analysts in solving this problem. This process is based on unsupervised machine learning techniques like hierarchical clustering and term co-occurrence network. A tool that helps to apply the proposed process was created to provide consolidated and structured results. This includes measurements and a content exploration interface.
基金supported by the Ministry of Knowledge Economy(MKE),KoreaMicrosoft Research through the IT/SW Creative Research Program supervised by the National IT Industry Promotion Agency(NIPA)of Korea under Grant No.NIPA2012-H0503-12-1012the Next-Generation Information Computing Development Program through the National Research Foundation of Korea(NRF)funded by the Ministry of Science,ICT&Future Planning of Korea under Grant No.NRF-2012M3C4A7033344
文摘Archives of threaded discussions generated by users in online forums and discussion boards contain valuable knowledge on various topics. However, not all threads are useful because of deliberate abuses, such as trolling and flaming, that are commonly observed in online conversations. The existence of various users with different levels of expertise also makes it difficult to assume that every discussion thread stored online contains high-quality contents. Although finding high-quality threads automatically can help both users and search engines sift through a huge amount of thread archives and make use of these potentially useful resources effectively, no previous work to our knowledge has performed a study on such task. In this paper, we propose an automatic method for distinguishing high-quality threads from low-quality ones in online discussion sites. We first suggest four different artificial measures for inducing overall quality of a thread based on ratings of its posts. We then propose two tasks involving prediction of thread quality without using post rating information. We adopt a popular machine learning framework to solve the two prediction tasks. Experimental results on a real world forum archive demonstrate that our method can significantly improve the prediction performance across all four measures of thread quality on both tasks. We also compare how different types of features derived from various aspects of threads contribute to the overall performance and investigate key features that play a crucial role in discovering high-quality threads in online discussion sites.
基金supported by the National Natural Science Foundation of China under Grant No. 60873246China Information Technology Security Evaluation Centre
文摘There is a major defect when using the traditional topic-opinion model for post opinion classifications in an online forum discussion.The accuracy of the classification based on the topic-opinion model highly depends on the observable topic-opinion features aiming at the subject,while a large number of posts do not have such features in a forum.Therefore,for the most part,the accuracy is less than 78%.To solve this problem,we propose a new method to identify post opinions based on the Tree Conditional Random Fields(T-CRFs)model.First,we select the topic-opinion features of the posts and associated opinion features between posts to construct the T-CRFs model,and then we use the T-CRFs model to label the opinions of the tree-structured posts under the same topic iteratively to reach a maximum joint probability.To reduce the training cost,we design a simplified tree diagram module and some feature templates.Experimental results suggest the proposed method costs less training time and improves the accuracy by 11%.
基金supported by the‘‘111”project funded by the Ministry of Education of China and the State Administration of Foreign Experts Affairs of China[grant number B18043]
文摘In this study,we use initial public offerings(IPOs)in China to investigate how online stock forums influence information asymmetry and IPO valuation.The empirical analysis isolates the underpricing and overvaluation components of initial returns.The number of forum comments,postings,and readings are positively associated with initial returns and the degree of underpricing,implying that forums create noise that exacerbates information asymmetry during IPOs.This effect is amplified by the quiet period regulation,which drives investors to rely on online discussion forums to obtain information.Through sentiment analyses of forum posts and media coverage,we find that the negative effect of online forums is more prominent when bad news prevails.We clarify the role of online stock forums in IPO pricing and information asymmetry by separating underpricing from overvaluation in initial returns.