Some microblog services encourage users to annotate themselves with multiple tags, indicating their attributes and interests. User tags play an important role for personalized recommendation and information retrieval....Some microblog services encourage users to annotate themselves with multiple tags, indicating their attributes and interests. User tags play an important role for personalized recommendation and information retrieval. In order to better understand the semantics of user tags, we propose Tag Correspondence Model (TCM) to identify complex correspondences of tags from the rich context of microblog users. The correspondence of a tag is referred to as a unique element in the context which is semantically correlated with this tag. In TCM, we divide the context of a microblog user into various sources (such as short messages, user profile, and neighbors). With a collection of users with annotated tags, TCM can automatically learn the correspondences of user tags from multiple sources. With the learned correspondences, we are able to interpret implicit semantics of tags. Moreover, for the users who have not annotated any tags, TCM can suggest tags according to users' context information. Extensive experiments on a real-world dataset demonstrate that our method can efficiently identify correspondences of tags, which may eventually represent semantic meanings of tags.展开更多
The Chinese Software Developer Network(CSDN)is one of the largest information technology communities and service platforms in China.This paper describes the user profiling for CSDN,an evaluation track of SMP Cup 2017....The Chinese Software Developer Network(CSDN)is one of the largest information technology communities and service platforms in China.This paper describes the user profiling for CSDN,an evaluation track of SMP Cup 2017.It contains three tasks:(1)user document keyphrase extraction,(2)user tagging and(3)user growth value prediction.In the first task,we treat keyphrase extraction as a classification problem and train a Gradient-Boosting-Decision-Tree model with comprehensive features.In the second task,to deal with class imbalance and capture the interdependency between classes,we propose a two-stage framework:(1)for each class,we train a binary classifier to model each class against all of the other classes independently;(2)we feed the output of the trained classifiers into a softmax classifier,tagging each user with multiple labels.In the third task,we propose a comprehensive architecture to predict user growth value.Our contributions in this paper are summarized as follows:(1)we extract various types of features to identify the key factors in user value growth;(2)we use the semi-supervised method and the stacking technique to extend labeled data sets and increase the generality of the trained model,resulting in an impressive performance in our experiments.In the competition,we achieved the first place out of 329 teams.展开更多
Purpose:Currently,social tagging behavior,including social tag,online review and score information,has been investigated extensively,however,there are very few works about the relationship among them.In this paper,we ...Purpose:Currently,social tagging behavior,including social tag,online review and score information,has been investigated extensively,however,there are very few works about the relationship among them.In this paper,we have investigated the problem using Douban Website as the research object.Design/methodology/approach:Firstly,we divided social tags into those with high and low frequency counts,respectively,divided books into popular and unpopular books according to books’popularity,and chose core tags in terms of distribution;Secondly,we conducted an investigation on the relationship between social tags and books scores including comprehensive analyses and assorted analyses.Findings:The more popular the books become,the higher scores they will get.Tag frequency is not related with book scores directly,and neither does the tag distribution weight.Tags in books of'fashion'category are relatively disordered,which may associate with books miscellany and readers diversity.Research limitations:Social tags are growing dramatically,strategies and researches to this respect are just experimental exploration.Open source books,data and educational resources are not consummate.Comparative studies are necessary,but the result may be affected by researches based on data analyses.In addition,this research has been conducted only on one website,namely Douban,and the tags provided by Douban Book are not complete.All these factors could influence the versatility of the results.Practical implications:There are very a few studies that have been conducted on the relationship between tags and scores,and this research could bring a certain practical significance to popular books prediction and tags’quality research.Originality/value:Less attention has been paid to Chinese books while analyzing relationship between scores and tags of user generated content.Analyses based on the Chinese books may fill in the gap of better understanding the relationship between the two objects.展开更多
基金the National Natural Science Foundation of China under Grant Nos. 61170196 and 61202140, and the Major Project of the National Social Science Foundation of China under Grant No. 13&ZD190.
文摘Some microblog services encourage users to annotate themselves with multiple tags, indicating their attributes and interests. User tags play an important role for personalized recommendation and information retrieval. In order to better understand the semantics of user tags, we propose Tag Correspondence Model (TCM) to identify complex correspondences of tags from the rich context of microblog users. The correspondence of a tag is referred to as a unique element in the context which is semantically correlated with this tag. In TCM, we divide the context of a microblog user into various sources (such as short messages, user profile, and neighbors). With a collection of users with annotated tags, TCM can automatically learn the correspondences of user tags from multiple sources. With the learned correspondences, we are able to interpret implicit semantics of tags. Moreover, for the users who have not annotated any tags, TCM can suggest tags according to users' context information. Extensive experiments on a real-world dataset demonstrate that our method can efficiently identify correspondences of tags, which may eventually represent semantic meanings of tags.
基金The work is supported by the National Natural Science Foundation of China(NSFC)under grant numbers 61472400,91746301 and 61802371H.Shen is also funded by K.C.Wong Education Foundation and the Youth Innovation Promotion Association of the Chinese Academy of Sciences.
文摘The Chinese Software Developer Network(CSDN)is one of the largest information technology communities and service platforms in China.This paper describes the user profiling for CSDN,an evaluation track of SMP Cup 2017.It contains three tasks:(1)user document keyphrase extraction,(2)user tagging and(3)user growth value prediction.In the first task,we treat keyphrase extraction as a classification problem and train a Gradient-Boosting-Decision-Tree model with comprehensive features.In the second task,to deal with class imbalance and capture the interdependency between classes,we propose a two-stage framework:(1)for each class,we train a binary classifier to model each class against all of the other classes independently;(2)we feed the output of the trained classifiers into a softmax classifier,tagging each user with multiple labels.In the third task,we propose a comprehensive architecture to predict user growth value.Our contributions in this paper are summarized as follows:(1)we extract various types of features to identify the key factors in user value growth;(2)we use the semi-supervised method and the stacking technique to extend labeled data sets and increase the generality of the trained model,resulting in an impressive performance in our experiments.In the competition,we achieved the first place out of 329 teams.
基金supported by the National Natural Science Foundation of China(Grant No.:71273126)the Foundation for Humanities and Social Science of the Chinese Ministry of Education(Grant No.:13YJA870020)
文摘Purpose:Currently,social tagging behavior,including social tag,online review and score information,has been investigated extensively,however,there are very few works about the relationship among them.In this paper,we have investigated the problem using Douban Website as the research object.Design/methodology/approach:Firstly,we divided social tags into those with high and low frequency counts,respectively,divided books into popular and unpopular books according to books’popularity,and chose core tags in terms of distribution;Secondly,we conducted an investigation on the relationship between social tags and books scores including comprehensive analyses and assorted analyses.Findings:The more popular the books become,the higher scores they will get.Tag frequency is not related with book scores directly,and neither does the tag distribution weight.Tags in books of'fashion'category are relatively disordered,which may associate with books miscellany and readers diversity.Research limitations:Social tags are growing dramatically,strategies and researches to this respect are just experimental exploration.Open source books,data and educational resources are not consummate.Comparative studies are necessary,but the result may be affected by researches based on data analyses.In addition,this research has been conducted only on one website,namely Douban,and the tags provided by Douban Book are not complete.All these factors could influence the versatility of the results.Practical implications:There are very a few studies that have been conducted on the relationship between tags and scores,and this research could bring a certain practical significance to popular books prediction and tags’quality research.Originality/value:Less attention has been paid to Chinese books while analyzing relationship between scores and tags of user generated content.Analyses based on the Chinese books may fill in the gap of better understanding the relationship between the two objects.