In this study, methods to classify advertising reviews from shopping mall reviews are suggested. Advertising reviews are mostly written by companies and contain advertising contents. There are a few studies regarding ...In this study, methods to classify advertising reviews from shopping mall reviews are suggested. Advertising reviews are mostly written by companies and contain advertising contents. There are a few studies regarding the classification of opinion spam documents, which is very rare in foreign studies; however, there are no studies that classify advertising reviews from Korean reviews. In this study, the Naive Bayes Classifier was used to classify review documents and the POS (Part-of-Speech)-Tagging and bigram methods were used to extract specific words. The frequency calculation methods for the probability value of specific words were: (1) The general number of appearances of words (2) the frequency calculation of specific words through the suggested Latent Semantic Analysis (LSA), and by recalculating the result from (1) in (2), the performances of each method were compared. As a result, the methods from (2) showed 88.43% accuracy which is 8.89% higher than 79.54% which was the previous result from using the POS-Tagging + Bigram method. Therefore, it was proved that the method suggested in this study is effective at classifying or extracting advertising reviews from Korean product review documents.展开更多
文摘In this study, methods to classify advertising reviews from shopping mall reviews are suggested. Advertising reviews are mostly written by companies and contain advertising contents. There are a few studies regarding the classification of opinion spam documents, which is very rare in foreign studies; however, there are no studies that classify advertising reviews from Korean reviews. In this study, the Naive Bayes Classifier was used to classify review documents and the POS (Part-of-Speech)-Tagging and bigram methods were used to extract specific words. The frequency calculation methods for the probability value of specific words were: (1) The general number of appearances of words (2) the frequency calculation of specific words through the suggested Latent Semantic Analysis (LSA), and by recalculating the result from (1) in (2), the performances of each method were compared. As a result, the methods from (2) showed 88.43% accuracy which is 8.89% higher than 79.54% which was the previous result from using the POS-Tagging + Bigram method. Therefore, it was proved that the method suggested in this study is effective at classifying or extracting advertising reviews from Korean product review documents.