Microblogs have become an important platform for people to publish,transform information and acquire knowledge.This paper focuses on the problem of discovering user interest in microblogs.In this paper,we propose a to...Microblogs have become an important platform for people to publish,transform information and acquire knowledge.This paper focuses on the problem of discovering user interest in microblogs.In this paper,we propose a topic mining model based on Latent Dirichlet Allocation(LDA) named user-topic model.For each user,the interests are divided into two parts by different ways to generate the microblogs:original interest and retweet interest.We represent a Gibbs sampling implementation for inference the parameters of our model,and discover not only user's original interest,but also retweet interest.Then we combine original interest and retweet interest to compute interest words for users.Experiments on a dataset of Sina microblogs demonstrate that our model is able to discover user interest effectively and outperforms existing topic models in this task.And we find that original interest and retweet interest are similar and the topics of interest contain user labels.The interest words discovered by our model reflect user labels,but range is much broader.展开更多
Based on the characteristics of high-end products,crowd-sourcing user stories can be seen as an effective means of gathering requirements,involving a large user base and generating a substantial amount of unstructured...Based on the characteristics of high-end products,crowd-sourcing user stories can be seen as an effective means of gathering requirements,involving a large user base and generating a substantial amount of unstructured feedback.The key challenge lies in transforming abstract user needs into specific ones,requiring integration and analysis.Therefore,we propose a topic mining-based approach to categorize,summarize,and rank product requirements from user stories.Specifically,after determining the number of story categories based on py LDAvis,we initially classify“I want to”phrases within user stories.Subsequently,classic topic models are applied to each category to generate their names,defining each post-classification user story category as a requirement.Furthermore,a weighted ranking function is devised to calculate the importance of each requirement.Finally,we validate the effectiveness and feasibility of the proposed method using 2966 crowd-sourced user stories related to smart home systems.展开更多
In the field of query recommendation,the current techniques for semantic analysis technology can’t meet the demands of users.In order to meet diverse needs,we improved the LDA model and designed a new query recommend...In the field of query recommendation,the current techniques for semantic analysis technology can’t meet the demands of users.In order to meet diverse needs,we improved the LDA model and designed a new query recommendation model based on collaborative filtering-Semantic Factor Model(SFM),which combines text information,user interest information and web source.First,we improved the LDA model from bag-of-word to bag-of-phrase to understand the topics expressed by users’frequently used sentences.The phrase bag model treats phrases as a whole and can capture more accurate query intent.Second,we use collaborative filtering to build an evaluation matrix between user interests and personalized expressions.Third,we designed a new scoring function that can recommend the top n resources to users.Finally,we conduct experiments on the AOL data set.The experimental results show that compared with other latest query recommendation techniques,SFM has higher recommendation quality.展开更多
As one of the early COVID-19 epidemic outbreak areas,China attracted the global news media’s attention at the beginning of 2020.During the epidemic period,Chinese people united and actively fought against the epidemi...As one of the early COVID-19 epidemic outbreak areas,China attracted the global news media’s attention at the beginning of 2020.During the epidemic period,Chinese people united and actively fought against the epidemic.However,in the eyes of the international public,the situation reported about China is not optimistic.To better understand how the international public portrays China,especially during the epidemic,we present a case study with big data technology.We aim to answer three questions:(1)What has the international media focused on during the COVID-19 epidemic period?(2)What is the media’s tone when they report China?(3)What is the media’s attitude when talking about China?In detail,we crawled more than 280000 pieces of news from 57 mainstream media agencies in 22 countries and made some interesting observations.For example,international media paid more attention to Chinese livelihood during the COVID-19 epidemic period.In March and April,“progress of Chinese vaccines,”“specific drugs and treatments,”and“virus outbreak in U.S.”became the media’s most common topics.In terms of news attitude,Cuba,Malaysia,and Venezuela had a positive attitude toward China,while France,Canada,and the United Kingdom had a negative attitude.Our study can help understand China’s image in the eyes of the international media and provide a sound basis for image analysis.展开更多
基金This work was supported by the National High Technology Research and Development Program of China(No. 2010AA012505, 2011AA010702, 2012AA01A401 and 2012AA01A402), Chinese National Science Foundation (No. 60933005, 91124002,61303265), National Technology Support Foundation (No. 2012BAH38B04) and National 242 Foundation (No. 2011A010)
文摘Microblogs have become an important platform for people to publish,transform information and acquire knowledge.This paper focuses on the problem of discovering user interest in microblogs.In this paper,we propose a topic mining model based on Latent Dirichlet Allocation(LDA) named user-topic model.For each user,the interests are divided into two parts by different ways to generate the microblogs:original interest and retweet interest.We represent a Gibbs sampling implementation for inference the parameters of our model,and discover not only user's original interest,but also retweet interest.Then we combine original interest and retweet interest to compute interest words for users.Experiments on a dataset of Sina microblogs demonstrate that our model is able to discover user interest effectively and outperforms existing topic models in this task.And we find that original interest and retweet interest are similar and the topics of interest contain user labels.The interest words discovered by our model reflect user labels,but range is much broader.
基金supported by the National Natural Science Foundation of China(71690233,71901214)。
文摘Based on the characteristics of high-end products,crowd-sourcing user stories can be seen as an effective means of gathering requirements,involving a large user base and generating a substantial amount of unstructured feedback.The key challenge lies in transforming abstract user needs into specific ones,requiring integration and analysis.Therefore,we propose a topic mining-based approach to categorize,summarize,and rank product requirements from user stories.Specifically,after determining the number of story categories based on py LDAvis,we initially classify“I want to”phrases within user stories.Subsequently,classic topic models are applied to each category to generate their names,defining each post-classification user story category as a requirement.Furthermore,a weighted ranking function is devised to calculate the importance of each requirement.Finally,we validate the effectiveness and feasibility of the proposed method using 2966 crowd-sourced user stories related to smart home systems.
基金the Hubei Provincial Natural Science Foundation of China[Grant Number 2019cfc880]。
文摘In the field of query recommendation,the current techniques for semantic analysis technology can’t meet the demands of users.In order to meet diverse needs,we improved the LDA model and designed a new query recommendation model based on collaborative filtering-Semantic Factor Model(SFM),which combines text information,user interest information and web source.First,we improved the LDA model from bag-of-word to bag-of-phrase to understand the topics expressed by users’frequently used sentences.The phrase bag model treats phrases as a whole and can capture more accurate query intent.Second,we use collaborative filtering to build an evaluation matrix between user interests and personalized expressions.Third,we designed a new scoring function that can recommend the top n resources to users.Finally,we conduct experiments on the AOL data set.The experimental results show that compared with other latest query recommendation techniques,SFM has higher recommendation quality.
文摘As one of the early COVID-19 epidemic outbreak areas,China attracted the global news media’s attention at the beginning of 2020.During the epidemic period,Chinese people united and actively fought against the epidemic.However,in the eyes of the international public,the situation reported about China is not optimistic.To better understand how the international public portrays China,especially during the epidemic,we present a case study with big data technology.We aim to answer three questions:(1)What has the international media focused on during the COVID-19 epidemic period?(2)What is the media’s tone when they report China?(3)What is the media’s attitude when talking about China?In detail,we crawled more than 280000 pieces of news from 57 mainstream media agencies in 22 countries and made some interesting observations.For example,international media paid more attention to Chinese livelihood during the COVID-19 epidemic period.In March and April,“progress of Chinese vaccines,”“specific drugs and treatments,”and“virus outbreak in U.S.”became the media’s most common topics.In terms of news attitude,Cuba,Malaysia,and Venezuela had a positive attitude toward China,while France,Canada,and the United Kingdom had a negative attitude.Our study can help understand China’s image in the eyes of the international media and provide a sound basis for image analysis.