Most news topic detection methods use word-based methods,which easily ignore the relationship among words and have semantic sparsity,resulting in low topic detection accuracy.In addition,the current mainstream probabi...Most news topic detection methods use word-based methods,which easily ignore the relationship among words and have semantic sparsity,resulting in low topic detection accuracy.In addition,the current mainstream probability methods and graph analysis methods for topic detection have high time complexity.For these reasons,we present a news topic detection model on the basis of capsule semantic graph(CSG).The keywords that appear in each text at the same time are modeled as a keyword graph,which is divided into multiple subgraphs through community detection.Each subgraph contains a group of closely related keywords.The graph is used as the vertex of CSG.The semantic relationship among the vertices is obtained by calculating the similarity of the average word vector of each vertex.At the same time,the news text is clustered using the incremental clustering method,where each text uses CSG;that is,the similarity among texts is calculated by the graph kernel.The relationship between vertices and edges is also considered when calculating the similarity.Experimental results on three standard datasets show that CSG can obtain higher precision,recall,and F1 values than several latest methods.Experimental results on large-scale news datasets reveal that the time complexity of CSG is lower than that of probabilistic methods and other graph analysis methods.展开更多
How to quickly and accurately detect new topics from massive data online becomes a main problem of public opinion monitoring in cyberspace. This paperpresents a new event detection method for the current new event det...How to quickly and accurately detect new topics from massive data online becomes a main problem of public opinion monitoring in cyberspace. This paperpresents a new event detection method for the current new event detection system, based on sorted subtopic matching algorithm and constructs the entire design framework. In this p^per, the subtopics contained in old topics (or news stories) are sorted in descending order according to their importance to the topic(or news stories), and form a sorted subtopic sequence. In the process of subtopic matching, subtopic scoring matrix is used to determine whether a new story is reporting a new event. Experimental results show that the sorted subtopic matching model improved the accuracy and effectiveness ofthenew event detection system in cyberspace.展开更多
The COVID-19 pandemic has become one of the severe diseases in recent years.As it majorly affects the common livelihood of people across the universe,it is essential for administrators and healthcare professionals to ...The COVID-19 pandemic has become one of the severe diseases in recent years.As it majorly affects the common livelihood of people across the universe,it is essential for administrators and healthcare professionals to be aware of the views of the community so as to monitor the severity of the spread of the outbreak.The public opinions are been shared enormously in microblogging med-ia like twitter and is considered as one of the popular sources to collect public opinions in any topic like politics,sports,entertainment etc.,This work presents a combination of Intensity Based Emotion Classification Convolution Neural Net-work(IBEC-CNN)model and Non-negative Matrix Factorization(NMF)for detecting and analyzing the different topics discussed in the COVID-19 tweets as well the intensity of the emotional content of those tweets.The topics were identified using NMF and the emotions are classified using pretrained IBEC-CNN,based on predefined intensity scores.The research aimed at identifying the emotions in the Indian tweets related to COVID-19 and producing a list of topics discussed by the users during the COVID-19 pandemic.Using the Twitter Application Programming Interface(Twitter API),huge numbers of COVID-19 tweets are retrieved during January and July 2020.The extracted tweets are ana-lyzed for emotions fear,joy,sadness and trust with proposed Intensity Based Emotion Classification Convolution Neural Network(IBEC-CNN)model which is pretrained.The classified tweets are given an intensity score varies from 1 to 3,with 1 being low intensity for the emotion,2 being the moderate and 3 being the high intensity.To identify the topics in the tweets and the themes of those topics,Non-negative Matrix Factorization(NMF)has been employed.Analysis of emotions of COVID-19 tweets has identified,that the count of positive tweets is more than that of count of negative tweets during the period considered and the negative tweets related to COVID-19 is less than 5%.Also,more than 75%nega-tive tweets expressed sadness,fear are of low intensity.A qualitative analysis has also been conducted and the topics detected are grouped into themes such as eco-nomic impacts,case reports,treatments,entertainment and vaccination.The results of analysis show that the issues related to the pandemic are expressed dif-ferent emotions in twitter which helps in interpreting the public insights during the pandemic and these results are beneficial for planning the dissemination of factual health statistics to build the trust of the people.The performance comparison shows that the proposed IBEC-CNN model outperforms the conventional models and achieved 83.71%accuracy.The%of COVID-19 tweets that discussed the different topics vary from 7.45%to 26.43%on topics economy,Statistics on cases,Government/Politics,Entertainment,Lockdown,Treatments and Virtual Events.The least number of tweets discussed on politics/government on the other hand the tweets discussed most about treatments.展开更多
Purpose:Opinion mining and sentiment analysis in Online Learning Community can truly reflect the students’learning situation,which provides the necessary theoretical basis for following revision of teaching plans.To ...Purpose:Opinion mining and sentiment analysis in Online Learning Community can truly reflect the students’learning situation,which provides the necessary theoretical basis for following revision of teaching plans.To improve the accuracy of topic-sentiment analysis,a novel model for topic sentiment analysis is proposed that outperforms other state-of-art models.Methodology/approach:We aim at highlighting the identification and visualization of topic sentiment based on learning topic mining and sentiment clustering at various granularitylevels.The proposed method comprised data preprocessing,topic detection,sentiment analysis,and visualization.Findings:The proposed model can effectively perceive students’sentiment tendencies on different topics,which provides powerful practical reference for improving the quality of information services in teaching practice.Research limitations:The model obtains the topic-terminology hybrid matrix and the document-topic hybrid matrix by selecting the real user’s comment information on the basis of LDA topic detection approach,without considering the intensity of students’sentiments and their evolutionary trends.Practical implications:The implication and association rules to visualize the negative sentiment in comments or reviews enable teachers and administrators to access a certain plaint,which can be utilized as a reference for enhancing the accuracy of learning content recommendation,and evaluating the quality of their services.Originality/value:The topic-sentiment analysis model can clarify the hierarchical dependencies between different topics,which lay the foundation for improving the accuracy of teaching content recommendation and optimizing the knowledge coherence of related courses.展开更多
With the rapid popularization of social applications, various kinds of social media have developed into an important platform for publishing information and expressing opinion. Detecting hidden topics from the huge am...With the rapid popularization of social applications, various kinds of social media have developed into an important platform for publishing information and expressing opinion. Detecting hidden topics from the huge amount of user-generated contents is of great commerce value and social significance. However traditional text analysis approachesonly focus on the statistical correlation between words, but ignore the sentiment tendency and the temporal properties which may have great effects on topic detection results. This paper proposed a Dynamic Sentiment-Topic(DST) model which can not only detect and track the dynamic topics but also analyze the shift of public's sentiment tendency towards certain topic.Expectation-Maximization algorithm was used in DST model to estimate the latent distribution, and we used Gibbs sampling method to sample new document set and update the hyper parameters and distributions.Experiments are conducted on a real dataset and the results show that DST model outperforms the existing algorithms in terms of topic detection and sentiment accuracy.展开更多
Purpose–The purpose of this paper is to analyze topics as alternative features for sentiment analysis in Indonesian tweets.Design/methodology/approach–Given Indonesian tweets,the processes of sentiment analysis star...Purpose–The purpose of this paper is to analyze topics as alternative features for sentiment analysis in Indonesian tweets.Design/methodology/approach–Given Indonesian tweets,the processes of sentiment analysis start by extracting features from the tweets.The features are words or topics.The authors use non-negative matrix factorization to extract the topics and apply a support vector machine to classify the tweets into its sentiment class.Findings–The authors analyze the accuracy using the two-class and three-class sentiment analysis data sets.Both data sets are about sentiments of candidates for Indonesian presidential election.The experiments show that the standard word features give better accuracies than the topics features for the two-class sentiment analysis.Moreover,the topic features can slightly improve the accuracy of the standard word features.The topic features can also improve the accuracy of the standard word features for the three-class sentiment analysis.Originality/value–The standard textual data representation for sentiment analysis using machine learning is bag of word and its extensions mainly created by natural language processing.This paper applies topics as novel features for the machine learning-based sentiment analysis in Indonesian tweets.展开更多
Hashtags are important metadata in microblogs and are used to mark topics or index messages. However,statistics show that hashtags are absent from most microblogs. This poses great challenges for the retrieval and ana...Hashtags are important metadata in microblogs and are used to mark topics or index messages. However,statistics show that hashtags are absent from most microblogs. This poses great challenges for the retrieval and analysis of these tagless microblogs. In this paper, we summarize the similarity between microblogs and shortmessage-style news, and then propose an algorithm, named 5WTAG, for detecting microblog topics based on a model of five Ws(When, Where, Who, What, ho W). As five-W attributes are the core components in event description, it is guaranteed theoretically that 5WTAG can properly extract semantic topics from microblogs. We introduce the detailed procedure of the algorithm in this paper including spam microblog identification, microblog segmentation, and candidate hashtag construction. In addition, we propose a novel recommendation computing method for ranking candidate hashtags, which combines syntax and semantic analysis and observes the distribution of artificial topic hashtags. Finally, we conduct comprehensive experiments to verify the semantic correctness and completeness of the candidate hashtags, as well as the accuracy of the recommendation method using real data from Sina Weibo.展开更多
文摘Most news topic detection methods use word-based methods,which easily ignore the relationship among words and have semantic sparsity,resulting in low topic detection accuracy.In addition,the current mainstream probability methods and graph analysis methods for topic detection have high time complexity.For these reasons,we present a news topic detection model on the basis of capsule semantic graph(CSG).The keywords that appear in each text at the same time are modeled as a keyword graph,which is divided into multiple subgraphs through community detection.Each subgraph contains a group of closely related keywords.The graph is used as the vertex of CSG.The semantic relationship among the vertices is obtained by calculating the similarity of the average word vector of each vertex.At the same time,the news text is clustered using the incremental clustering method,where each text uses CSG;that is,the similarity among texts is calculated by the graph kernel.The relationship between vertices and edges is also considered when calculating the similarity.Experimental results on three standard datasets show that CSG can obtain higher precision,recall,and F1 values than several latest methods.Experimental results on large-scale news datasets reveal that the time complexity of CSG is lower than that of probabilistic methods and other graph analysis methods.
基金Funded by the Planning Project of National Language Committee in the "12th 5-year Plan"(No.YB125-49)the Foundation for Key Program of Ministry of Education,China(No.212167)the Fundamental Research Funds for the Central Universities(No.SWJTU12CX096)
文摘How to quickly and accurately detect new topics from massive data online becomes a main problem of public opinion monitoring in cyberspace. This paperpresents a new event detection method for the current new event detection system, based on sorted subtopic matching algorithm and constructs the entire design framework. In this p^per, the subtopics contained in old topics (or news stories) are sorted in descending order according to their importance to the topic(or news stories), and form a sorted subtopic sequence. In the process of subtopic matching, subtopic scoring matrix is used to determine whether a new story is reporting a new event. Experimental results show that the sorted subtopic matching model improved the accuracy and effectiveness ofthenew event detection system in cyberspace.
文摘The COVID-19 pandemic has become one of the severe diseases in recent years.As it majorly affects the common livelihood of people across the universe,it is essential for administrators and healthcare professionals to be aware of the views of the community so as to monitor the severity of the spread of the outbreak.The public opinions are been shared enormously in microblogging med-ia like twitter and is considered as one of the popular sources to collect public opinions in any topic like politics,sports,entertainment etc.,This work presents a combination of Intensity Based Emotion Classification Convolution Neural Net-work(IBEC-CNN)model and Non-negative Matrix Factorization(NMF)for detecting and analyzing the different topics discussed in the COVID-19 tweets as well the intensity of the emotional content of those tweets.The topics were identified using NMF and the emotions are classified using pretrained IBEC-CNN,based on predefined intensity scores.The research aimed at identifying the emotions in the Indian tweets related to COVID-19 and producing a list of topics discussed by the users during the COVID-19 pandemic.Using the Twitter Application Programming Interface(Twitter API),huge numbers of COVID-19 tweets are retrieved during January and July 2020.The extracted tweets are ana-lyzed for emotions fear,joy,sadness and trust with proposed Intensity Based Emotion Classification Convolution Neural Network(IBEC-CNN)model which is pretrained.The classified tweets are given an intensity score varies from 1 to 3,with 1 being low intensity for the emotion,2 being the moderate and 3 being the high intensity.To identify the topics in the tweets and the themes of those topics,Non-negative Matrix Factorization(NMF)has been employed.Analysis of emotions of COVID-19 tweets has identified,that the count of positive tweets is more than that of count of negative tweets during the period considered and the negative tweets related to COVID-19 is less than 5%.Also,more than 75%nega-tive tweets expressed sadness,fear are of low intensity.A qualitative analysis has also been conducted and the topics detected are grouped into themes such as eco-nomic impacts,case reports,treatments,entertainment and vaccination.The results of analysis show that the issues related to the pandemic are expressed dif-ferent emotions in twitter which helps in interpreting the public insights during the pandemic and these results are beneficial for planning the dissemination of factual health statistics to build the trust of the people.The performance comparison shows that the proposed IBEC-CNN model outperforms the conventional models and achieved 83.71%accuracy.The%of COVID-19 tweets that discussed the different topics vary from 7.45%to 26.43%on topics economy,Statistics on cases,Government/Politics,Entertainment,Lockdown,Treatments and Virtual Events.The least number of tweets discussed on politics/government on the other hand the tweets discussed most about treatments.
基金supported by the Teaching Research Major Projects of Anhui Province(2018jyxm1446)the Natural Scientific Project of Anhui Provincial Department of Education(KJ2019A0371)+1 种基金the Anhui Demonstration Experiment Training Center Project(2018sxzx58)the Demonstration Projects for Massive Open Online Course of Anhui Province(2018mooc278)。
文摘Purpose:Opinion mining and sentiment analysis in Online Learning Community can truly reflect the students’learning situation,which provides the necessary theoretical basis for following revision of teaching plans.To improve the accuracy of topic-sentiment analysis,a novel model for topic sentiment analysis is proposed that outperforms other state-of-art models.Methodology/approach:We aim at highlighting the identification and visualization of topic sentiment based on learning topic mining and sentiment clustering at various granularitylevels.The proposed method comprised data preprocessing,topic detection,sentiment analysis,and visualization.Findings:The proposed model can effectively perceive students’sentiment tendencies on different topics,which provides powerful practical reference for improving the quality of information services in teaching practice.Research limitations:The model obtains the topic-terminology hybrid matrix and the document-topic hybrid matrix by selecting the real user’s comment information on the basis of LDA topic detection approach,without considering the intensity of students’sentiments and their evolutionary trends.Practical implications:The implication and association rules to visualize the negative sentiment in comments or reviews enable teachers and administrators to access a certain plaint,which can be utilized as a reference for enhancing the accuracy of learning content recommendation,and evaluating the quality of their services.Originality/value:The topic-sentiment analysis model can clarify the hierarchical dependencies between different topics,which lay the foundation for improving the accuracy of teaching content recommendation and optimizing the knowledge coherence of related courses.
基金supported by National Natural Science Foundation of China with granted No.61402045,61370197the Specialized Research Fund for the Doctoral Program of Higher Education with granted No.20130005110011the National High Technology Research and Development Program with granted No.2013AA013301
文摘With the rapid popularization of social applications, various kinds of social media have developed into an important platform for publishing information and expressing opinion. Detecting hidden topics from the huge amount of user-generated contents is of great commerce value and social significance. However traditional text analysis approachesonly focus on the statistical correlation between words, but ignore the sentiment tendency and the temporal properties which may have great effects on topic detection results. This paper proposed a Dynamic Sentiment-Topic(DST) model which can not only detect and track the dynamic topics but also analyze the shift of public's sentiment tendency towards certain topic.Expectation-Maximization algorithm was used in DST model to estimate the latent distribution, and we used Gibbs sampling method to sample new document set and update the hyper parameters and distributions.Experiments are conducted on a real dataset and the results show that DST model outperforms the existing algorithms in terms of topic detection and sentiment accuracy.
文摘Purpose–The purpose of this paper is to analyze topics as alternative features for sentiment analysis in Indonesian tweets.Design/methodology/approach–Given Indonesian tweets,the processes of sentiment analysis start by extracting features from the tweets.The features are words or topics.The authors use non-negative matrix factorization to extract the topics and apply a support vector machine to classify the tweets into its sentiment class.Findings–The authors analyze the accuracy using the two-class and three-class sentiment analysis data sets.Both data sets are about sentiments of candidates for Indonesian presidential election.The experiments show that the standard word features give better accuracies than the topics features for the two-class sentiment analysis.Moreover,the topic features can slightly improve the accuracy of the standard word features.The topic features can also improve the accuracy of the standard word features for the three-class sentiment analysis.Originality/value–The standard textual data representation for sentiment analysis using machine learning is bag of word and its extensions mainly created by natural language processing.This paper applies topics as novel features for the machine learning-based sentiment analysis in Indonesian tweets.
基金supported by the National Natural Science Foundation of China (No. 61173027)the Northeastern University Fundamental Research Funds for the Central Universities (Nos. N150404012 and N140404006)
文摘Hashtags are important metadata in microblogs and are used to mark topics or index messages. However,statistics show that hashtags are absent from most microblogs. This poses great challenges for the retrieval and analysis of these tagless microblogs. In this paper, we summarize the similarity between microblogs and shortmessage-style news, and then propose an algorithm, named 5WTAG, for detecting microblog topics based on a model of five Ws(When, Where, Who, What, ho W). As five-W attributes are the core components in event description, it is guaranteed theoretically that 5WTAG can properly extract semantic topics from microblogs. We introduce the detailed procedure of the algorithm in this paper including spam microblog identification, microblog segmentation, and candidate hashtag construction. In addition, we propose a novel recommendation computing method for ranking candidate hashtags, which combines syntax and semantic analysis and observes the distribution of artificial topic hashtags. Finally, we conduct comprehensive experiments to verify the semantic correctness and completeness of the candidate hashtags, as well as the accuracy of the recommendation method using real data from Sina Weibo.